I have created a Firebase database with the following structure:
There are 3 main collections:
Users, Chats and another one which is not important for now.
I have initially planned to store type documents in the Users collection, the type User is made of the properties:
uid -> string
... (some others which are not important)
displayName -> string
chatsList -> a SUBCOLLECTION of chat references, containing
chat ID
otherUserID -> the ID of the other user chatting with you
contacts -> a SUBCOLLECTION of user references, containing
user ID
user displayName
type Chat is made like this (note that the chat is only between two users):
uid -> string
user1 -> ID of one of the users in this chat
user2 -> ID of the other user
messages -> a SUBCOLLECTION of messages made of:
uid -> message ID
senderID -> user ID of who sent the message
text -> content of the message
timestamp -> when was this message sent
So if I am a logged in user I know my id and if I want my chat with user id 123 I can query the collection "users/:myID/chatsList" and find the chat reference which satisfies: "otherUserID", "==", "123"
Then pick the related chat ID and find the actual chat in the Chats collection.
I am not sure but I understand that queries are shallow, so if I query for example the Chats collection I am not considering the subcollection Messages at all, so it should be faster?
Now the question is: do you think this is a good structure or is it better to not have those subcollections and use something like an array instead?
If I query for example the Chats collection I am not considering the subcollection Messages at all, so it should be faster?
Firestore queries are fast irrespective of you are fetching a document from a collection of 10 documents or 10 thousand. The only thing that can make your query slow is querying many documents at once as you will be downloading a lot of data.
Is it better to not have those subcollections and use something like an array instead?
This totally depends on your use case. Firestore documents have a max size limit of 1 MB so if there is no limit on how many messages, it's better to use a sub-collection.
Using a sub-collection can make it easier to:
Fetch/Update/Delete a single message by ID
Paginate the messages (when fetching a document, you get all the data in it)
If you need either of the features above, it might be better to use a sub-collection.
Do checkout Firebase Realtime Database that sounds like a better option for chat applications in terms of pricing as well.
Also checkout: How to shard data Realtime Database for chat app?
Related
I am currently exploring MongoDB.
I built a notes web app and for now the DB has 2 collections: notes and users.
The user can create, read and update his notes.
I want to create a page called /my-notes that will display all the notes that belong to the connected user.
My question is:
Should the notes model has an ownerId field or the opposite - the user model will have a field of noteIds of type list.
Points I found relevant for the decision making:
noteIds approach:
There is no need to query the notes that hold the desired ownerId (say we have a lot of notes then we will need indexes and search accross the whole notes collection). We just need to find the user by user ID and then get all the notes by their IDs.
In this case there are 2 calls to DB.
The data is ordered by the order of insertion to the notesIds field in the document.
ownerId approach:
We do need to find the notes by their ownerId field across the notes collection which might be more computer "intensive".
We can paginate / sort the data as we want - more control over the data.
Are there any more points you can think of?
As I can conclude this is a question of whether you want less computer intensive DB calls vs more control over the data.
What are the "best practices"?
Thanks,
A similar use case is explained in the documentation. If there is no limit on number of notes a user can have, it might be better to store a userId reference field in notes document.
As you've figured out already, pagination would be easier in the second approach. Also when updating notes, you can simply updateOne({ _id: "note_id", userId: 1 }) instead of checking user's document if the note actually belong to the user.
I'm developing a python GraphQL API server using FastAPI and Strawberry, and I'm implementing this feature:
I have two entities, User and Order, which have a one to many association (a User can have many Orders, an Order can only have one User), and I want to get a list of users, each with the list of their orders.
If I were to implement this with a simple REST endpoint, I could make a first query to get the users and then a second query to fetch the orders, sorting out, in the python code, which user each order belongs to based on the foreign key value.
Using Strawberry GraphQL though, it seems I cannot avoid making a query for each user, given that, even using the data loader pattern I would still need to know the order ids beforehand, and this is making the response time much slower.
Is there a solution to this problem? Is my approach completely wrong?
The data loader function itself can do the grouping.
async def load_orders_by_user(keys: list[int]) -> Iterable[list[Order]]:
orders = # select * from order where user_id in keys
groups = {key: [] for key in keys} # dict maintains order
for order in orders:
groups[order.user_id].append(order)
return groups.values()
Variants on that idea:
the loader could also load the users at the same time
the loader could group order ids only and use a different data loader for orders
I am using mongodb as the database for a project I've been working on and in the database I have a "user" collection and an "account" collection. Every user has one account and every account has a "user" field that is the _id of the corresponding user. The reason I separated these into two collections is because I thought it made sense to keep the user's sensitive data (password, email, legal name, etc.) separate from the account data (things like interests, followers, username, etc.). Also the account collection has a lot of fields so it just seemed easier to not over-saturate the "user" collection with data.
So, my question is - Now that I essentially have 2 collections pointing to the same user, should I use the "user._id" to query both users and accounts? Since each account has a unique "user" field, is there a reason to query those accounts with their own _id property? It seems odd to keep track of two different _id's on the frontend and conditionally send either the user._id or account._id.
The two main drawbacks I have found when using the user._id to query both users and accounts is:
When querying account data, I have to almost always make sure I send the "user" field so I have that id on the front end.
If in the future, I wanted to add the ability for users to create multiple accounts, I would have to change the code to now fetch account data using the "account._id".
Hopefully that all makes sense, and maybe it doesn't even make sense for me to separate those collections. Thank you to anyone who can help!
I am trying to think of a way to design the firestore db in a way that is efficient.
The main issue I am having with is how I should define "groups". Lets say a user is invited to a group chat and so the client needs to retrieve the data for that group chat, should I have a "groups" collection and then find the correct group document? OR, should I have a "groups" property in the user document that has a id to reference the group to retrieve?
In SQL, having a reference in a user's groups table would be the obvious answer, but I am not sure about firestore. I don't want to look through the entire collection of groups just to find the group that the user was newly invited in. Any tips? Also, my front end is in React and I am considering using the onSnapshot method to subscribe to the collection (that seems to be the best way to have real time updates).
What i believe is best for you is this :
First have a collection, suppose you make groups, and inside that every docuent has all the group unique ids,
And inside that for every group, i.e document, you can have a collection which holds all the chats for that group and group related info , like group type, etc etc
Hope it helps. feel free for doubts
I am looking how to create an efficient model which will satisfy the requirements I put below. I have tried using gcloud-node but have noticed it has limitations with read consistencies, references, etc. I would prefer to write this is nodejs, but would be open to writing in java or python as long as it would improve my model. I am building around the new pricing model which will come July 1st.
My application consists of a closed email system. In essence what happens is users register to the site. These user's can make friends. Then they can send emails to each other.
Components of the app:
Users - Unlimited amount of users can join.
Friends - A User can have 200 confirmed friends and 100 pending friend requests. When a friendlist is retrieved it should show the name of the friend. (I will also need to receive the id of the friends so I can use it on my client side to create emails).
Emails - Users can send emails to their friends and they can receive emails from their friends. The user can then view all their sent emails independently(sentbox) and all their received emails independently(inbox).
They can also view the the emails sent between themselves and a friend order by newest. The emails should show the senders and receivers names. Once an email is read it needs to be marked as read.
My model looks something like this, but as you can see their are inefficiencies.
Datastore Kinds:
USER
-email (id) //The email doesn't need to be the id, but I need to be able to retrieve users by their email
-hash_password
-name
-account_status
-created_date
FRIEND
-id (auto-generated)
-friend1
-friend2
-status
EMAIL
-id (auto-generated)
-from
-to
-mutual_id
-message
-created_date
-has_seen
Procedures of the application:
Register - Get operation to see if a user with this email exists. If does not insert key.
Login - Get operation to get user based on email. If exists retrieve the hash_password from the entity and compare to user's input.
Send friend request - Friend data will be written twice for every relationship. Then using the index on friend1 and index on status I will query all the friends for a user and filter only those which are 'pending'. I will then count these friends and see if they are over X. Again I will do this for the other user. If they are both not over the pending limit, I will insert the friend request. This needs to run in a transaction.
Accept a friend request - Friend data will be written twice for every relationship. Then using the index on friend1 and index on status I will query all the friends for a user and filter only those which are pending. I will then count these friends and see if they are over X. Again I will do this for the other user. If they are both not over the pending limit, I will change both entities's status to accepted as a transaction.
Show confirmed friends - Friend data will be written twice for every relationship. Then using the index on friend1 and index on status I will query all the friends for a user and filter only those which are accepted. Not sure how I will show the friend's names (e.g what happens if a user changed their name this needs to be reflected in all friend relationships and emails!).
Show pending friends - Friend data will be written twice for every relationship. Then using the index on friend1 and index on status I will query all the friends for a user and filter only those which are pending. Not sure how I will show the friend's names (e.g what happens if a user changed their name this needs to be reflected in all friend relationships and emails!).
View sent emails - Using the index on the from property I would query to get all the sent emails from a user 5 at a time ordered by created_date (newest first). (e.g what happens if a user changed their name this needs to be reflected in all friend relationships and emails!).
View received emails - Using the index on the to property I would query to get all the received emails to a user 5 at a time ordered by created_date (newest first). When a emails is seen it will update that entities has_seen property to true. (e.g what happens if a user changed their name this needs to be reflected in all friend relationships and emails!).
View emails between 2 users - Using the index on mutual_id which is based on [lower_lexicographic_email]:[higher_lexicographic_email] to query the mutual emails. Ordered by newest, 5 at a time. (e.g what happens if a user changed their name this needs to be reflected in all friend relationships and emails!).
Create email - Using the friend1 and status index I will confirm the user's are friends. If they are friends, I will insert an email.