Mongodb schema best storage of Achievement system [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
Im going to create a achievement system in Mongodb. But im not sure how i would format/store it in the database.
As of the users should have a progress (on each achievement they would have some progress value stored), im really confused what would be the best way to perform this, and without having an performence issue.
what should i do?, cause i dont know, what i had in mind, was maybe something like:
Should i store each achievement in an unique row in a Achievement collection, and an user array within that row, containing object with userid and achievement progress?
Would i then get an performance issue when its 1000+ achievements, that is beeing checked fairy often?
or should i do something else?
example schema for the option above:
{
name:{
type:String,
default:'Achievement name'
},
users:[
{
userid:{
type:String,
default:' users id here'
},
progress:{
type:Number,
default:0
}
}
]
}

Even though the question is specifically about the database design, I will give a solution for the tracking/awarding logic as well to establish more accurate context for the db design.
I would store the achievements progress separately from the already awarded achievements for cleaner tracking and discovery.
The whole logic is event based and has multiple layers of event handling. This gives you TONS of flexibility on how you track your data and also gives you a pretty good mechanism to track history. Basically, you can look at it as a form of logging.
Of course, your system design and contracts are highly dependent on the information you're gonna be tracking and its complexity. A simple progress field may not suffice for each case(you might want to track something more complex, not a simple number between X and Y). There is also the case of tracking data which updates quite frequently(as distance travelled in games, for example). You didn't give any context on the topic of your achievement system so we're gonna stick with a generic solution. It's just a couple of things that you should take a note about as it will affect the design.
Okay, so, let's start from the top and track the entire flow for a tracked piece of data and its eventual achievement progress. Let's say we're tracking consecutive days of user login and we're gonna award him with an achievement when he reaches [10].
Note that everything below is just a pseudo-code.
So, let's say today is [8th of July, 2017]. For now, our User entity looks like this:
User: {
id: 7;
trackingData: {
lastLogin: 7 of July, 2017 (should be full DateTime object, but using this for brevity),
consecutiveDays: 9
},
achievementProgress: [
{
achievementID: 10,
progress: 9
}
],
achievements: []
}
And our achievements collection contains the following entity:
Achievement: {
id: 10,
name: '10 Consecutive Days',
rewardValue: 10
}
The user tries to login(or visit the site). The application handler takes note of that and after handling the login logic fires an event of type ACTION:
ACTION_EVENT = {
type: ACTION,
name: USER_LOGIN,
payload: {
userID: 7,
date: 8 of July, 2017 (should be full DateTime object, but using this for brevity)
}
}
We have an ActionHandler which listens for events of type ACTION:
ActionHandler.handleEvent(actionEvent) {
subscribersMap = Map<eventName, handlers>;
subscribersMap[actionEvent.name].forEach(subscriber => subscriber.execute(actionEvent.payload));
}
subscribersMap gives us a collection of handlers that should respond to each specific action(this should resolve to USER_LOGIN for us). In our case we can have 1 or 2 that concern themselves with updating the user tracking information of lastLogin and consecutiveDays tracking properties in the user entity. The handlers in our case will update the tracking information and fire new events further down the line.
Once again, for brevity, we're gonna incorporate both into one:
updateLoginHandler: function(payload) {
user = db.getUser(payload.userID);
let eventType;
let eventValue;
if (date - user.trackingData.lastLogin > 1 day) {
user.trackingData = 1;
eventType = 'PROGRESS_RESET';
eventValue = 1;
}
else {
const newValue = user.trackingData.consecutiveDays + 1;
user.trackingData.consecutiveDays = newValue;
eventType = 'PROGRESS_INCREASE';
eventValue = newValue;
}
user.trackingData.lastLogin = payload.date;
/* DISPATCH NEW EVENT OF TYPE ACHIEVEMENT_PROGRESS */
AchievementProgressHandler.dispatch({
type: ACHIEVEMENT_PROGRESS
name: eventType,
payload: {
userID: payload.userID,
achievmentID: 10,
value: eventValue
}
});
}
Here, PROGRESS_RESET have the same contract as the PROGRESS_INCREASE but have a different semantic meaning and I would keep them separate for history/tracking purposes. If you wish, you can combine them into a single PROGRESS_UPDATE event.
Basically, we update the tracked fields that are dependent on the lastLogin date and fire a new ACHIEVEMENT_PROGRESS event which should be handled by a separate handler with the same pattern(AchievementProgressHandler). In our case:
ACHIEVEMENT_PROGRESS_EVENT = {
type: ACHIEVEMENT_PROGRESS,
name: PROGRESS_INCREASE
payload: {
userID: 7,
achievementID: 10,
value: 10
}
}
Then, in AchievementProgressHandler we follow the same pattern:
AchievementProgressHandler: function(event) {
achievementCheckers = Map<achievementID, achievementChecker>;
/* update user.achievementProgress code */
switch(event.name): {
case 'PROGRESS_INCREASE':
achievementCheckers[event.payload.achievementID].execute(event.payload);
break;
case 'PROGRESS_RESET':
...
}
}
achievementCheckers contains a checker function for each specific achievement that decides if the achievement has reached its desired value(a progress of 100%) and should be awarded. This enables us to handle all kinds of complex cases. If you only track a single X out of Y scenario, you can share the function between all achievements.
The handler basically does this:
achievementChecker: function(payload) {
achievementAwardHandler;
achievement = db.getAchievement(payload.achievementID);
if (payload.value >= achievement.rewardValue) {
achievementAwardHandler.dispatch({
type: ACHIEVEMENT_AWARD,
name: ACHIEVEMENT_AWARD,
payload: {
userID: payload.userID,
achievementID: achievementID,
awardedAt: [current date]
}
});
/* Here you can clear the entry from user.achievementProgress as you no longer need it. You can also move this inside the achievementAwardHandler. */
}
}
We once again dispatch an event and use an event handler - achievementAwardHandler. You can skip the event creation step and award the achievement to the user directly but we keep it consistent with the whole history logging flow.
An added benefit here is that you can use the handler to defer the achievement awarding to a specific later time thus effectively batching awarding for multiple users, which serve a couple of purposes including performance enhancement.
Basically, this pseudo code handles the flow from [a user action] to [achievement rewarding] with all intermediate steps included. It's not set in stone, you can modify it as you like but all in all, it gives you a clean separation of concerns, cleaner entities, it's performant, let's you add complex checks and handlers which are easy to reason about while in the same time provide a great history log of the user overall progress.
Regarding the DB schema entities, I would suggest the following:
User: {
id: any;
trackingData: {},
achievementProgress: {} || [],
achievements: []
}
Where:
trackingData is an object that contains everything you're willing
to track about the user. The beauty is that properties here are
independent from achievement data. You can track whatever and eventually use it for achievement purposes.
achievementProgress: a map of <key: achievementID, value: data> or
an array containing the current progress for each achievement.
achievements: an array of awarded achievements.
and Achievement:
Achievement: {
id: any,
name: any,
rewardValue: any (or any other field/fields. You have complete freedom to introduce any kind of tracking with the approach above),
users?: [
{
userID: any,
awardedAt: date
}
]
}
users is a collection of users who have been rewarded the given achievement. This is optional and is here only if you have the use for it and query for this data frequently.

What you might be looking for is a Badge style implementation. Just like Stack Overflow rewards it's users with badges for specific achievements.
Method 1: You can have flags in the user profile for each badge. Since you're doing it in NoSQL database, you just have to set a flag for each badge.
const badgeSchema = new mongoose.Schema({
badgeName: {
type: String,
required: true,
},
badgeDescription: {
type: String,
required: true,
}
});
const userSchema = new mongoose.Schema({
userName: {
type: String,
required: true,
},
badges: {
type: [Object],
required: true,
}
});
If your application architecture is event based, you can trigger awarding badges to users. And that operation is just inserting Badge object with progress in User badges array.
{
badgeId: ObjectId("602797c8242d59d42715ba2c"),
progress: 10
}
Update operation will be to find and update the badges array with progress percentage number
And while displaying user achievements on user interface, you can just loop over badges array to show the badges this user has achieved and their progress with it.
Method 2: Have a separate mongo collection for Badge and User Mapping. Whenever a user achieves a badge you insert a record in that collection. It will be one to one mapping of user _id and badge _id and progress value. But as the table will grow huge you will need to do indexing to efficiently query user and badge mapping.
You will have to do analysis on best approach according to your specific use case.

MongoDB is flexible enough to allow teams develop applications quickly, and involve their model with litter friction as the application needs it. In cases where you need a robust model from day one, theirs is a flexible methodology that can guide you through the process of modeling your data.
The methodology is composed of:
Workload: This stage is about gathering as much information as possible to understand your data. This will allow you formulate assumptions about, you data size the operations that will be performance against it (reads and writes), quantify operations and qualify operations.
You can get this by:
Scenarios
Prototype
Production Logs & Stats (if you are migrating).
Relationships: Identify the relationship between the different entities in your data, quantify those relationships and apply embedding or linking. In general you should prefer embedding by default, but remember that arrays should not grow without bound (6 Rules of Thumb for MongoDB Schema Design: Part 3).
Patterns: Apply schema design patterns. Take a look at Building with Patterns: A Summary, it presents a matrix that highlights the pattern that could be useful for a given use case.
Finally, the goal of this methodology is help you create a model, that can scale and perform well under stress.

If you design the achievement schema like this:
{
name: {
type: String,
default: "Achievement name",
},
userid: {
type: String,
default: " users id here",
},
progress: {
type: Number,
default: 0,
},
}
}
When an achievement is gained you just add another entry
for getting achievements Map-Reduce is a good candidate for running map reduce on the database. you can run them on a less regular basis, using them for offline computation of the data that you want.
based on documentation you can do like the following photo

Related

Setting state before render with ref to firestore

I am working through a simple logical problem, but I cannot seem to have things work smoothly. Let me share my most convincing code experiment and then I'll share some thoughts.
useEffect(() => {
firebase
.firestore()
.collection("someCollection")
.orderBy("date", "desc")
.onSnapshot(docs => {
let documents = []
if (canGetUpdatesFromFirestore.current) {
docs.forEach((doc) => {
documents.push(doc.data())
})
if(documents.length > 3) {
documents.splice(4, 0, {questionPostId: 0})
documents.splice(5, 0, {questionPostId: 1})
}
setAllQuestions(documents)
setUsers(documents)
}
})
if (searchValue.length > 2) {
canGetUpdatesFromFirestore.current = false;
functions.searchForSearchVal(searchValue, "Sexuality")
.then((result) => {
setAllQuestions(result);
})
} else {
canGetUpdatesFromFirestore.current = true
}
}, [searchValue])
function setUsers(docs){
let arrFinal = []
let copyOfAllQuestions = ""
for(let i = 0; i< docs.length; i++) {
console.log("HERE")
if (docs[i].postedBy) {
docs[i].ref.get().then(userFire => {
copyOfAllQuestions = {
...allQuestions,
...{hasPremium: userFire.data().hasPremium}
}
})
arrFinal.push(copyOfAllQuestions)
}
}
setAllQuestions(arrFinal)
}
Let me share some of my current state and what I am trying to accomplish.
I have a that display allQuestions. Each question data has a ref to its user document in firestore. For each question I need to check if that user hasPremium. How should I go about doing that the correct way?
The problem currently is that I can get the data from my Users collection through the ref, but I have to refresh my state in order for it all to show.
Could someone help me get on the right path / think correctly on this one please?
One approach that I put forward is to embrace data denormalization. That is, rather than putting references to other documents (Users) inside of the Questions document, put all the relevant user information directly into the Questions document.
This is antithetical to SQL database approaches, but that's okay because Firestore is "NoSQL". Embrace the anti-SQL-idity!!
Essentially, in your Question document you want to copy in whatever information is required in your app when working with a Question, and avoid doing "joins" by fetching other documents. You don't need to copy in all of the User document into a Question document - just the elements needed when your app is working with a Question.
For example, maybe in the question all you need is:
question: {
name: ...,
type: ...,
lastUpdated: ...,
postedBy: {
email: ...,
displayName: ...,
avatarUrl: ...,
hasPremium: true,
}
}
With data duplicated, you often need a mechanism to keep duplicate data up-to-date from its "source". So you might consider a Cloud Function trigger for onUpdate() of User documents, and when a relevant value is modified (email, displayName, avatarUrl, and/or hasPremium) then you would loop through all questions that are postedBy that user and update accordingly.
The rules-of-thumb here are:
all data needed for one screen/function in your app goes into a SINGLE document
NoSQL document stores are used where reads are frequent and writes are infrequent
NoSQL data stores (typically) do not have "joins" - so don't design your app to require them (which is what your code above is doing: joining Question and Users)
often you don't care about updating ALL instances of duplicated data (e.g. if a user updates their displayName today, should you update a Question they posted 3 years ago? -- different apps/business needs will give different answers)

How to model this NoSQL data structure in Firestore (Review my first approach)

I am a fairly new web developer and would need your help with a project I am currently working on. I have worked in the past on a very simple realtime database example and have little to none experience in firestore or NoSql in general.
I want to create a system which allows end-users to get an email once a week that contains a list of special offers from bars the end-user has subscribed to. The offers change each day of the week. Bar owners can fill out a form in a vue.js web application every week with their weekly special offers.
Every Monday morning a cron job has to look up which end user has subscribed to which bars and then aggregate the data and send it via email.
The question is how would you structure the data so that I can easily compose the email and send it via a cloud function?
My approach would be to have three main collections: RestaurantOwner, EndUser, SpecialOfferings
Please see the graphic for an example process:
BarOwner and EndUser are pretty straight forward. However, the difficult part is how to structure the SpecialOffers in order to be queried the right way.
My idea would be to structure it based on the calendar week and link it to the uid from the barOwner:
specialOffers: {
2019_CW27: {
barUID001: {
mon: {
title: 'Banana Daiquir',
price: 4.99,
},
tue: {
title: 'After Five',
price: 2.99,
},
wed: {
title: 'Cool Colada',
price: 6.99
},
thu: {
title: 'Crantini',
price: 5.99
},
fri: {
title: 'French Martini',
price: 4.99
}
},
barUID002: {
mon: {
title: 'Gin & Tonic',
price: 8.99,
},
tue: {
title: 'Cratini',
price: 4.99,
},
wed: {
title: 'French Martini',
price: 4.99
},
thu: {
title: 'After Five',
price: 3.99
},
fri: {
title: 'Cool Colada',
price: 6.99
}
}
},
2019_CW28: {
barUID01: {~~~},
barUID02: {~~~}
}
}
The disadvantage of this approach is that it creates a deeply nested object when you imagine that there are 52 calendar weeks, f.e 100 signed up bars à 5 special offers per week and I am not sure if I am able to query it the way I need to.
Is this approach reasonable or what would you do differently?
Thank you so much for your help! I highly appreciate it.
I'm assuming the following scenarios:
1) The bar owners make modifications to their offers very often.
2) The bar owners should be the only ones allowed to modify each bar's offers.
If you have these two scenarios, I would recommend a sub-collections approach here.
When to use sub-collections:
1) When there are lot of fields in a document. Cloud Firestore has 20,000 field limit. (If the number of Bars can exceed more than 20,000 fields)
2) When updating the parent collection is a common operation. Firestore only lets you update the document at rate of 1 write/second. (If the SpecialOffers information of each bar is modified very often. If two bar owners modify their offers, only 1 write is successful and the second write operation waits until the first is completed. This can delay the updation offers particularly at the end of a week when almost all the bars update the offers.)
3) When you want to limit the access to particular fields of a document. (If you want to restrict the access to a Bar's Offers to the barOwner alone. You can restrict the access to each document in the Bars sub-collection according to its owner using Firestore Security Rules)
So I would recommend a sub-collection Bars under the main collection SpecialOffers. This way the design becomes scalable and you can add restaurants and super-markets as other similar sub-collections in the future without heavily altering your design.
Another advantage is that sub-collections are basically collections and they don't have a limit for number of documents it can hold. So even if the number of bars registered is above 20,000 which is the limit of number of fields for a fire-store document, your sub-collection wont be having a problem but your document will run out of fields to save the offers for a new bar.
Ultimately the choice depends on your use cases.
Hope this helps.

Where to store a reference to other data models (in mongoDB) for best performance

In my project I have users and circles. Circles can have multiple users and a user can be in multiple circles. Lastly there are events. Each event can have multiple users in one circle. Later, events will get a lot of content, so there will be a lot of stuff to load (images, comments, etc.).
I was thinking that these would be a good data models:
User = {
_id: "uuid",
name: "string",
password: "string",
circles: [Circle._id],
}
Event = {
_id: "uuid",
name: "string",
location: "string",
circles:Circle._id,
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string"
}
Once the user logs in, he/she selects one of his circles, users and events in that circle will be displayed.
An API with these data models (I think) would mean to get the users and events from one circle, the database has to search through all users and events and check check if they are in that circle. With a lot of users and events, I think this might not be the most efficient way?
So I was thinking of putting the user and events into arrays of the circle like this:
User = {
_id: "uuid",
name: "string",
password: "string",
}
Event = {
_id: "uuid",
name: "string",
location: "string",
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string",
users:[User._id],
events:[Event._id]
}
Now, when the user selects the circle, the circle loads slower, because the users and events have to be loaded first. But I was thinking, that searching for users and events would now be faster. Is this the correct approach/thinking? Would it make sense to keep a reference to the specific circle ids in the User and Event data model?
If you want to use mongoDb to its full strength, I strongly recommend denormalising your data.
If you normalize your data, you might have to use $lookup to club multiple collections. Even if you save up on your harddisk, you will end up with relatively heavier computation.
Assuming that an application generally has 90% of hits as reads and 10% as writes, it makes sense to model your data in read friendly way. Hence highly denormalize your data untill its really necessary to create references to other collection. Optimizations can be later achieved by indexing and caching, but give below scema a thought.
User = {
_id: "uuid",
name: "string",
password: "string",
circles: ["circle1","circle2"],
events : ["event1","event2"]
}
Event = {
_id: "uuid",
name: "string",
location: "string"
}
Circle = {
_id: "uuid",
name: "string"
}
Try and know your queries beforehand, archiving most of your data in User collection. circles and events field in User collection can also be an array of objects [{},{}] if there are more properties to be stored.
I am certain that the more collections you club, the more complicated your queries will get and the computation will also be more.
I wont recommend storing userId's in circle or event collections as users may grow over time and you dont want to endup with a collection that has a document with one field storing thousands of array elements. On the contrary a user can be a part of 100's of circles and events, and if we store this data in User collection, it becomes quite easy to query and manage.
Long story short : Do not treat a nosql db as a relational db. It will never fit in. Model your database keeping your future queries in mind. Highly denormalize your data to make your read simpler i.e avoid references.

Get items which are part of the same group in Firebase

I have a simple data structure of users and events. I am wanting to find all users who are attending the same events that the logged in user is. Users can attend multiple events.
My data is setup as follows
{
events:{
123:{
name: 'event1',
users:{
9876: true,
7564: true
}
}
},
users:{
9876:{
name: 'John',
events:{
123: true
}
},
7564:{
name: 'Peter',
events:{
123: true
}
}
}
}
I have the following code to achieve this, I was just wondering if I am on the right path and if my data structure is correct for this type of query (Firebaseref is an Angular factory)
FirebaseRef.child("users/" + authData.uid + "/events").orderByChild('displayName').once("value", function (snap) {
snap.forEach(function (event) {
FirebaseRef.child("events/" + event.key() + "/users").once("value", function (userSnap) {
userSnap.forEach(function (user) {
FirebaseRef.child("users/" + user.key()).once("value", function (realUserSnap) {
if (realUserSnap.key() != authData.uid) {
//This is a user who attends the same event
}
});
});
});
});
});
I would probably change that outermost query from a once('value' to an on('child_added' (and its other child_* siblings). The main advantage is that you're monitoring/synchronizing the data, instead of retrieving is just once. An added advantage is that it will remove the need for your first forEach.
Aside from that, this looks pretty common. The inner calls need to be once's, because you only want them to execute once. Most people get nervous because of the number of the number of on calls that will happen. But Firebase's data retrieval has little overhead after the initial websocket connection has been set up, so this typically performs pretty well.
There are lots of "should", "typically" and "may" in this answer, since the only way to be certain is for you to actually:
verify that the code functionally does what your application requires
measure the performance of the code in the conditions that you expect your users to encounter
If you do run into higher-than-expected latency, you could consider denormalizing the data a bit further. For example: you could keep the user's name in each event, where you now store true. With that, you would need to look up each user's name.

Data Modeling Best Practices in Firebase/AngularFire

I'm developing an application in Firebase for the first time and was curious how I should model the data between two objects, a user and a post. I come from more of a relational db background and was curious not only how this would be done in nonrelational DBs but specifically how to set up a relationship between two objects in Firebase.
For example, my application has many Users, and each user creates many Posts.
User {
firstName: String,
lastname: String,
userName: String
}
Post {
title: String,
content: String,
date: Date,
writtenBy: [User object?]
}
How should I structure these two objects in Firebase so that a Post belongs to a User, but all Posts can be queried for regardless of User, and both User and Post objects can be edited without disrupting the other object's data and/or relationship?
And how should I create new "relational" objects via firebase:
sync.$set({userA: {
firstname: "Billy",
lastName: "Bob",
userName: "BillyBob",
Posts: {
// .....
}
}
});
Thanks!
Firebase is built with performance in mind. This is the reason you have to design data structures differently, normalization is your enemy in most cases. Every object in Firebase can be accessed by URL, and you should always keep this in mind.
There are still many ways of designing the data structures, it depends on what queries do you want to execute. If one of the queries is to be able to display all messages (I believe a number of latest messages would be the most common use case), but at the same time you want to be able to show messages per user than one of the possible data structures could look like this:
User {
userId(assigned by Firebase automatically) {
firstName: String,
lastname: String,
userName: String
}
}
Post {
User {
userId(matching userId in the User object) {
postId(assigned by Firebase for every new post automatically) {
title: String,
content: String,
date: Date,
writtenBy: String, userName or userId (this is not really needed, but may keep it for easier data access)
}
}
}
}
Then you can change any user data without triggering data change events in Posts, like in your example, (which would be extremely heavy if you have large number of messages).
You can get all messages independently of user:
var postListRef = new Firebase(URL);
var lastPostQuery = postListRef.child("Post").limit(500);
You can also use startAt() and endAt() quesries https://www.firebase.com/docs/web/api/query/limit.html
As a drawback - you have to unpack every message in the for loop if you need to show only messages, but I would expect you would show user info as well, so it should be ok.
If you want to listen for just one user messages, it's very simple and fast:
var postListRef = new Firebase(URL);
var lastPostQuery = postListRef.child("Post/User").child(userId);
And Angular/AngularFire has great support for this kind of data structures.
I am also new to Firebase, I would recommend the following structure.
Users: {
userID: {
firstName: String,
lastname: String,
userName: String,
posts: {
postID1:true,
postID2:true
}
Posts: {
postID1:{
title: String,
content: String,
date: Date,
writtenBy: userID
}
}
It allows you to get the latest posts without having to through any users. Plus you can get all the post made by any user.

Resources