Firebase database structure - database

I'm just starting to experiment with Firebase. It's a real head bender when you're used to relational databases!
I'm trying to design an app that will allow users to search for meals by barcode or name and retrieve the number of calories. Additionally, I need to be able to store the meals eaten by a user, and finally retrieve the food eaten by a user each day, week or month.
I was thinking each meal would have a unique ID (e.g. M1234 for Pizza), then I'd have 2 lookup sections - one by barcode and one by name, so that should hopefully cover the search functionality.
Each user would have the meals eaten stored in the eaten 'table' (what is the correct term for 'table' in a Firebase database?) by date, just referencing the meal by ID.
This is how I've designed the database.
{
// Here are the users.
"users": {
"mchen": {
"name": "Mary Chen",
"email": "mary#chen.com",
}
},
...
},
// Here are the meals eaten by date.
"eaten": {
"mchen": {
// index Mary's meals in her profile /eaten/mchen/meals/20161217 should return 'M1234' (pizza) and 'M8765' (chips)
"meals": {
"20161217": {
"M1234": true,
"M8765": true
},
"20161218": {
"M2222": true,
"M8765": true
}
},
...
},
// Here are the meals with calorie information.
"meals": {
"M1234": {
"name": "Pizza"
"calories": 400
},
"M2222": {
"name": "Curry"
"calories": 250
},
"M8765": {
"name": "Chips"
"calories": 100
},
},
// Here is the barcode lookup
"barcode-lookup": {
"12345678": {
"id": "M1234"
},
"87654321": {
"id": "M2222"
},
"11223344": {
"id": "M8765"
}
},
// Here is the name lookup
"name-lookup": {
"Chips": {
"id": "M8765"
},
"Pizza": {
"id": "M1234"
},
"Curry": {
"id": "M2222"
}
}
}
Does it seem reasonable or are there any obvious flaws?

You will want to leverage .childByAutoId() and let Firebase create the parent key names. It's best practice to disassociate your child data from the parent node and allowing Firebase to create 'random' key's for the parents will make that work.
Along with that, it's customary to create a /users node and the parent nodes for each user would be the uid which was created by Firebase when the user was first created.
In your original structure, there's a barcode and name lookup which I have integrated into the following structure to reduce complexity.
users
uid_0
name: "Mary Chen",
email: "mary#chen.com"
uid_1
name: "Larry David"
email: "ldavid#david.com"
and then the dining
dining
-Yuiia09skjspo
dining_timestamp: "20161207113010"
Y79joa90ksss: true
Yjs9990kokod: true
user: uid_0
uid_timestamp: "uid_0_ 20161207113010"
-Yi9sjmsospkos
dining_timestamp: "20161207173000"
Y79joa90ksss: true
Yjs9990kokod: true
user: uid_1
uid_timestamp: "uid_1_ 20161207173000"
and the meals the user can choose from
meal
-Y79joa90ksss
name: "Pizza"
calories: "400"
barcode: "008481816164"
-Yjs9990kokod
name: "Burger"
calories: "520"
barcode: "991994411815"
As you can see, the dining node contains a dining event for each user (so all of the dining events are in one node)
This enables you to query for all kinds of things:
All dining for all users by date or range of dates.
All dining that contain a certain meal
All meals by a user
->The cool one<- all dining for a specific user within a date range.
The one omission is a search for dining that contains two meals, however, the solution to that is also in this answer.
All in all, your structure is sound - just needs a little tweaking.

The structure looks fine (though I would let firebase generate the ids). The only thing that won't work like what you're expecting is searching. Based on your data if I searched for pizza you couldn't write a query that would return the Pizza entry. My suggestion would be to either use Algolia (or something similar) for searching or to roll another key with your name lowerCased to make it possible for a query to work. The only issue with running your own is you won't be able to search for things like izz and have Pizza turn up. See my answer Firebase - How can I filter similarly to equalTo() but instead check if it contains the value? for how to do a search.

Related

DynamoDB Update Expression For list in a Map

I have following dynamoDB Schema:
{
"Id": {"N": "789"},
"ProductCategory": {"S": "Home Improvement"},
"productReviews": {
"M": {
"FiveStar": {
"L": [
{ "S": "Best product ever!" }
]
},
"FourStar": {
"L": [
{ "S": "Good product" },
{ "S": "Another Review" }
]
}
}
}
}
So basically I have a map productReviews which has key as - "FourStar", "FiveStar", "TwoStar" etc. and value as List of reviews.
I want add new reviews to this table i.e. if a fiveStar reviews comes I will add/append it in the list of 'FiveStar' of productReviews. If a key does not exist in the map, I would like to just append the key value.
Is this possible in DynamoDB or I have to merge the list on my own and then update at each write.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.UpdateExpressions.html#Expressions.UpdateExpressions.SET
If you want this model, you'll need to merge and update on your own.
It might be simpler to make each star value its own item (give them all the same PK with an SK indicative of the star value) so you can update them directly as a single item with a string list to append to.
Depending on your access patterns, you might even want each review as its own item (still under the same PK). For example, if the item gets a thousand reviews coming in over time, you don't want to repeatedly update an ever growing item if you can just blindly add a small item each time.
On retrieval it'll still be efficient with a Query for the PK.

How to properly design a database, a single object vs multiple with relations

I am working on a project that will rely kinda heavily on a database, I've done several projects by now, but most of them barely had any significant database need, that would require me to put significant thought in it, but that time has come. I don't know much about databases, and I am not looking for someone to do it instead of me, but I am just looking for pointers and what to study/read in order to figure out how to handle the situation.
What the app will do is keep track of the items in stock (in the inventory) in a company, related to the products that it manufactures. I will have to be able to pull various kind of information, wether its all invoices, or all invoices related to a given product, or all items related to a given product, or all items for all products, and so on, you get the idea.
The basic structure that I have for now looks like this:
const inventory = {
products: [
{
title: "",
production: {
finalized: 0,
inProduction: 0,
},
items: [
{
title: "",
price: {
per: 0,
total: 0,
},
quantity: {
current: 0,
type: "",
},
operations: {
added: [
{
date: new Date(),
quantity: 0,
invoice: {
number: "",
supplier: "",
},
},
],
removed: [
{
date: new Date(),
quantity: 0,
expenseId: "",
},
],
},
},
],
},
],
};
It is not complex at all, an array of products, and information about these products. Quantities, when a new quantity was added, when a quantity was taken, and so on.
The question is, do I make this with just 1 model for the whole inventory, which I am guessing will work, but will probably not be optimal.
Another approach that comes to mind is to make models for the items, invoices, products and so on, and then just have some logic that relates all of that within the inventory to get the structure that I have above. That performance wise will probably be better, because if I want to pull all invoices, I will just have to pull the invoices model instead of the whole inventory and loop through it to find my invoices.
What other options do I have?
I forgot to add that I will be using Mongoose, Express, React and Node.

Is a bucket pattern in MongoDb the best way to handle large unbounded arrays?

I'm implementing social features to a MERN stack app (follow/unfollow users), and trying to come up with a good MongoDB solution for avoiding issues with potentially large unbounded arrays of followers. Specifically I'm hoping to avoid:
MongoDB having to move a large follower array on disk and rebuild indexes as it grows larger
hitting the 16mb bson limit if a user ever hits a very large number of followers (> 1 million)
slow performance when querying/returning followers to display via pagination, or when calculating/displaying follower count
From everything Iv'e researched, it seems like using a bucket pattern approach is the best solution... two good articles I found on this:
https://www.mongodb.com/blog/post/paging-with-the-bucket-pattern--part-1
https://www.mongodb.com/blog/post/paging-with-the-bucket-pattern--part-2
I've started to approach it like this...
Follower model:
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const FollowerSchema = new Schema({
user: {
type: Schema.Types.ObjectId,
ref: 'user',
},
// creating an array of followers
followers: [
{
user: {
type: Schema.Types.ObjectId,
ref: 'user',
},
datefol: {
type: Date,
default: Date.now,
},
},
],
count: {
type: Number,
},
createdate: {
type: Date,
default: Date.now,
required: true,
},
});
module.exports = Follower = mongoose.model('follower', FollowerSchema);
Upsert in Node.js api to add a follower to an array bucket (each bucket will contain 100 followers):
const follow = await Follower.updateOne(
{ user: req.params.id, count: { $lt: 100 } },
{
$push: {
followers: {
user: req.user.id,
datefol: Date.now(),
},
},
$inc: { count: 1 },
$setOnInsert: { user: req.params.id, createdate: Date.now() },
},
{ upsert: true }
);
Basically every time a follower is added, this will add them to the first bucket found that contains less than 100 followers (tracked by the count).
Is this the best approach for handling potentially large arrays? My concerns are:
if someone unfollows a user and the app runs a $pull to remove the follower from the array in one of the buckets... multiple buckets could then contain less than 100 followers. New followers will no longer be added to the most recent bucket so later when querying and trying to return followers based on most recent by bucket createdate... some of the newest followers might be in an older bucket and not returned correctly. The articles above mention some expressive update instructions introduced in MongoDb 4.2 that solve this problem, but it's not really clear to me how.
if I corrected for that by returning all follower buckets for a user and sorting by follow date... it seems like that could become very slow if someone had tons of followers
if I want to be able to paginate and return 100 followers per page, starting with the latest, how would that work with this approach? Should I add a pagenumber entry to the model and somehow have it be incremented each time a bucket is created (first bucket contains pagenumber 1, next pagnumber 2 etc), then on the front end if a user jumps to follower page 500 a query runs to pull bucket 500?
The bucket pattern is not the perfect match on the case you expose.
The pattern best fits your needs is the outlier pattern https://www.mongodb.com/blog/post/building-with-patterns-the-outlier-pattern
Your case is practically the same as the example on this article.

Where to store a reference to other data models (in mongoDB) for best performance

In my project I have users and circles. Circles can have multiple users and a user can be in multiple circles. Lastly there are events. Each event can have multiple users in one circle. Later, events will get a lot of content, so there will be a lot of stuff to load (images, comments, etc.).
I was thinking that these would be a good data models:
User = {
_id: "uuid",
name: "string",
password: "string",
circles: [Circle._id],
}
Event = {
_id: "uuid",
name: "string",
location: "string",
circles:Circle._id,
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string"
}
Once the user logs in, he/she selects one of his circles, users and events in that circle will be displayed.
An API with these data models (I think) would mean to get the users and events from one circle, the database has to search through all users and events and check check if they are in that circle. With a lot of users and events, I think this might not be the most efficient way?
So I was thinking of putting the user and events into arrays of the circle like this:
User = {
_id: "uuid",
name: "string",
password: "string",
}
Event = {
_id: "uuid",
name: "string",
location: "string",
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string",
users:[User._id],
events:[Event._id]
}
Now, when the user selects the circle, the circle loads slower, because the users and events have to be loaded first. But I was thinking, that searching for users and events would now be faster. Is this the correct approach/thinking? Would it make sense to keep a reference to the specific circle ids in the User and Event data model?
If you want to use mongoDb to its full strength, I strongly recommend denormalising your data.
If you normalize your data, you might have to use $lookup to club multiple collections. Even if you save up on your harddisk, you will end up with relatively heavier computation.
Assuming that an application generally has 90% of hits as reads and 10% as writes, it makes sense to model your data in read friendly way. Hence highly denormalize your data untill its really necessary to create references to other collection. Optimizations can be later achieved by indexing and caching, but give below scema a thought.
User = {
_id: "uuid",
name: "string",
password: "string",
circles: ["circle1","circle2"],
events : ["event1","event2"]
}
Event = {
_id: "uuid",
name: "string",
location: "string"
}
Circle = {
_id: "uuid",
name: "string"
}
Try and know your queries beforehand, archiving most of your data in User collection. circles and events field in User collection can also be an array of objects [{},{}] if there are more properties to be stored.
I am certain that the more collections you club, the more complicated your queries will get and the computation will also be more.
I wont recommend storing userId's in circle or event collections as users may grow over time and you dont want to endup with a collection that has a document with one field storing thousands of array elements. On the contrary a user can be a part of 100's of circles and events, and if we store this data in User collection, it becomes quite easy to query and manage.
Long story short : Do not treat a nosql db as a relational db. It will never fit in. Model your database keeping your future queries in mind. Highly denormalize your data to make your read simpler i.e avoid references.

Structuring user data by email address or by user ID

I want to have the users in the database structured in a way that makes it easier for a human to read and manage. Using the users email address as the property name instead of the User ID:
Users:
"Users" : {
"emailaddress#domain.com":{
"id": "DK66qu2dfUHt4ASfy36sdfYHS9fh",
"name": "A Display Name",
"groups": {
"moderators": true,
"users": true
}
},
{...}
}
So that if I have a list of users in a group, they can be read as a list of emails and not a list of user IDs.
Groups Such as:
"Groups": {
"moderators":{
"name": "moderator",
"members": {
"emailaddress#domain.com": true,
"emailaddress2#domain.com": true
}
}
}
Groups Instead of:
"Groups": {
"moderators":{
"name": "moderator",
"members": {
"DK66qu2dfUHt4ASfy36sdfYHS9fh": true,
"K2fkHYQDFOge3Hw7SjRaGP3N2sdo": true
}
}
}
However, using rules to verify a property of the user (such as their group), would require me to maintain two list of users, one like the list above, and another essentially a table of key-value pairs of ID's and email addresses so I can get the users email address from their uid.
Pseudo-code rule: Users[UsersKeyVal[auth.uid]].groups.moderator == true
With firebase, what would be considered the most acceptable practice? What are the pros and cons of both?
Please do not store user data under their email address! This will be BIG TROUBLE later.
Your users node should follow the 'standard' Firebase design pattern
users
uid_0
name:
gender:
etc
uid_1
name:
gender:
etc
The bottom line is that in general, it's best to disassociate the dynamic data stored in the node from the key of the node.
Why?
Suppose you build a complex structure with all kinds of links and references to frank#mycoolcompany.com and then #mycoolcompany.com gets acquired by #mynotsocoolcompany.com. Well, you will then have to go in and rebuild every reference to franks's email in the entire database. ugh.
Then what if there are 100 or 1000 users #mycoolcompany.com! Ouch.
If you disassociate the data, like my per above suggested structure, you just change the email address within the node and everything else... just works!
PLEASE, read this answer on stack overflow - written by a Firebaser and addresses your question
Firebase data structure and url
In my opinion there is no problem with your data structure.
According to the Doc
This is a necessary redundancy for two-way relationships. It allows you to quickly and efficiently fetch your members memberships
Also using the generated UId from firebase or your custom Id (here your e-mail) doesn't change the way firebase works. You just have to make sure your e-mail are unique.

Resources