How to create the following data structure in a NoSQL environment - database

Intro
I have a FireStore database similar to a social media db, with 3 collections Users, Events, and EventUpdates. My goal is to create a feed with eventUpdates created by me and my friends. So i have to expand my database with friendship connections. But i struggle with 3 problems, and hopefully somebody here can push me in the right direction to solve these.
Problem/Question 1:
I added username and user image to the EventUpdate model so it's easier to query. I've heard denormalise is the way to go in a NoSQL database. But if a user updates his user image, i've to update all eventUpdates created by that user. Sounds like something you don't wanne do. But is there a better way to do this?
Problem/Question 2:
How can i create a data structure that is optimised for performing the following query: get eventUpdates from me and my friends ordered by date.
Problem/Question 3:
How to store likes? I can keep a counter in a eventUpdate. But this becomes a problem when i denormalise eventUpdates (see current solution underneath EDIT)..
Data structure example .
{
"users": {
"1": { "name": "Jack", "imageUrl": "http://lorempixel.nl" }
},
"events": {
"A": {
"name": "BeerFestival",
"date": "2018/09/05",
"creatorId": "1"
}
},
"eventUpdates": {
"1": {
"timestamp": "13243543",
"creatorId: "1",
"creatorName": "Jack",
"creatorImageUrl": "http://lorempixel.nl",
"eventId": "A",
"message": "Lorem ipsum"
}
}
}
EDIT
OK, after some trial and error i ended up with the following structure. This structure seems work, but my problem with this solution is that i need to make a lot of write calls to update a single eventUpdate because of all the copies in each feed (1000 followers means 1000 copies). And it looks like i need to do that a lot.
I would like for example to add a like button to each event update. This trigger an update on all EventUpdate copies. For me it looks like firebase is not suited for my project and i'm thinking of replacing it with a SQL DB, or can anyone here change my mind with a better solution?
{
"users": {
"user1": { "name": "Jack",
"imageUrl": "http://lorempixel.nl",
"followers": ["user1"]
}
},
"feeds": {
"user1": {
"eventUpdates": {
"1": {
"timestamp": "13243543",
"creatorId: "1",
"eventId": "A",
"message": "Lorem ipsum"
}
},
"following": {
"user1": {
"name": "Jack",
"imageUrl": "http://lorempixel.nl",
"followers": ["user1"]
}
}
},
"events": {
"A": {
"name": "BeerFestival",
"date": "2018/09/05",
"creatorId": "1"
}
}
}

I added username and user image to the EventUpdate model so it's easier to query. I've heard denormalise is the way to go in a NoSQL database.
That's right, denormalization and is a common practice when it comes to Firebase. If you are new to NoQSL databases, I recommend you see this video, Denormalization is normal with the Firebase Database for a better understanding. It is for Firebase realtime database but same rules apply to Cloud Firestore.
But if a user updates his user image, i've to update all eventUpdates created by that user. Sounds like something you don't wanne do. But is there a better way to do this?
Yes, that's also correct. You need to update all the places where that image exists. Because you have chosen google-cloud-firestore as a tag, I recommend you see my answer from this post because in case of many write operations, Firestore might be a little costly. Please also see Firestore pricing plans.
Regarding Firestore, instead of holding an entire object you can only hold a reference to a picture. In this case, there is nothing that you need to update. It's always a trade between these two techniques and unfortunately there is no way between. You either hold objects or only references to objects. For that, please see my answer from this post.
How can i create a data structure that is optimised for performing the following query: get eventUpdates from me and my friends ordered by date.
As I see, your schema is more a Firebase realtime database schema more than a Cloud Firestore. And to answer to your question, yes you can create. So talking about Firestore, you can create a collection named eventUpdates that can hold eventUpdate objects and to query it according to a timestamp, a query like this is needed:
FirebaseFirestore rootRef = FirebaseFirestore.getInstance();
CollectionReference eventUpdatesRef = rootRef.collection("eventUpdates");
Query query = eventUpdatesRef.orderBy("timestamp", Query.Direction.ASCENDING);
But please note that the timestamp field should be of type Date and not long. Please also take a look at my answer from this post, to see how you can add a date property in a Cloud Firestore database.
How to store likes? I can keep a counter in a eventUpdate. But this becomes a problem when i denormalise eventUpdates (see current solution underneath EDIT)
You can simply add likes but I recommend you see the last part of my answer from this post. So you might consider adding that count in a Firebase realtime database rather than in Cloud Firestore. Both databases work very well together.
This structure seems work, but my problem with this solution is that i need to make a lot of write calls to update a single eventUpdate because of all the copies in each feed (1000 followers means 1000 copies). And it looks like i need to do that a lot.
You might also take a look at my answer from this post.
For me it looks like firebase is not suited for my project and i'm thinking of replacing it with a SQL DB, or can anyone here change my mind with a better solution?
I don't think this way. There are many apps out there that have the exact mechanism as yours and are working very well.

If you want your feed items to be in sync with the real users data (new profile image when the user changes it for example) you can simply store the user ID in the eventUpdate document. This way you don't have to keep them in sync manually, and every time you have to display the item in the feed you could simply fetch user data, and easily query many eventUpdates on userId and created_at fields ( assuming you have them ).
To implement likes in your feed the solution depends on a bunch of things like traffic volume.
The simplest way is to update a likes field with a transaction, but Firestore has a maximum updates frequency on a single document of 1 second. Plus, a transaction can easily fail if more than 5 transactions are trying to update the same document.
To implement a more solid likes system take a look at this page from the official Firebase docs.

Firestore has a different approach to the NoSQL world. Once you know the data you will use (as You already do) there are some very important points about what architecture the data will have. And It depends a lot about how the data grows, what kind of queries you will need and how often you will use them. Some cases You can create a root collection that aggregates data and queries might be easier.
There is a great video from Firebase Channel that might help. Check it out!
How to Structure Your Data | Get to Know Cloud Firestore #5
https://www.youtube.com/watch?v=haMOUb3KVSo
[UPDATED] December 26th
Others videos that might help to model and query your data is these videos:
How to Connect Firebase Users to their Data - 3 Methods
https://www.youtube.com/watch?v=jm66TSlVtcc
How to NOT get a 30K Firebase Bill
https://www.youtube.com/watch?v=Lb-Pnytoi-8
Model Relational Data in Firestore NoSQL
https://www.youtube.com/watch?v=jm66TSlVtcc

Related

When using React Query how do I make sure data is normalized across queries of a dataset?

I'm writing an sort of project management app that'll have a large number of "tasks" with a lot of properties within them that'll be rendered throughout the app. I am looking at using React Query to prefetch, cache locally, and update this data from the server.
One key architectural thing I want to get right is that when I query or mutate Tasks[123] that I affect a single underlying object in the state and not get stuck with duplicate data everywhere. On first glance React Query seems to be perfect for this job if the Query Keys are setup right. However in their examples they don't seem to do this (or I'm failing to understand).
In their Basic Example they fetch some Posts on start and query using queryClient.getQueryData(["post", post.id]). As far as I can tell this is causing the data to be duplicated if I look at the provided ReactQueryDevtools window in the example.
Am I correct in thinking the example should be rewritten to use something like queryClient.getQueryData(["posts", {id: post.id} ])?
That is indeed the way I am setting up my query keys, so that I can do: queryClient.invalidateQueries(['posts']) and it invalidates all of them. But sometimes, you need more fine granular control. If that's the case, I'd do:
["posts", "list", { filter: "all" }]
["posts", "list", { filter: "published" }]
["posts", "detail", 1]
["posts", "detail", 2]
that way, I can still tackle everything with ["posts"], all lists with ["posts", "list"], all details with ["posts", "detail"] and a specific detail with ["posts", "detail", id] etc.
It is also good practice to have a queryKeyFactory to create those keys, something like:
const postKeys = {
prefix: "posts",
lists: [postKeys.prefix, "list"],
list: (filter) => [...postKeys.lists, { filter }],
details: [postKeys.prefix, "detail"],
detail: (id) => [...postKeys.details, id]
}
Of course, I'm talking about "at scale" here. None of this is really needed for a todo app :)

How to perform Update in DynamoDB without sql-like UpdateExpression?

I have a very generic implementation of a Database class in my application.
I'd like to feed two things to the AWS DynamoDB Update: id and data {...} to be updated.
However, I only see the following SQL-like method in the docs:
{
TableName: "Music",
Key: {
"Artist":"No One You Know",
"SongTitle":"Call Me Today"
},
UpdateExpression: "SET RecordLabel = :label",
ExpressionAttributeValues: {
":label": "Global Records"
}
}
Is there really no way whatsoever to do this more MongoDB-style, without having to actually write queries? I would be looking more into something like this:
{
TableName: "Music",
Key: {
"id":"123-random-id"
},
Item: { "label": "Global Records" }
}
Unfortunately while this doesn't fail, it doesn't update anything.
I'm not familiar enough with MongoDB to compare its query syntax to DynamoDB.
However, if I understand your question correctly, you are asking if there isn't a simpler way to work with the DynamoDB API. In my experience, the answer to that question is No. Well, not using the AWS SDK directly.
There are several efforts in the community to build tooling that makes interaction with DynamoDB feel more natural. DynamoDB Toolbox is one such effort that has worked well for me, although it does not shield you from understanding the syntax of DynamoDB expressions.
For example, with DynamoDB Toolbox, your example code snippet could be written as:
let item = {
id: "123-random-id",
label: "Global Records",
}
await MyEntity.update(item)
You didn't explain what you hoped Item: { "label": "Global Records" } would do:
If you want to replace the entire item, you can do with with a PutItem.
However, if what you wanted to do is to only replace the label attribute of the item, and not touch any of the other existing attributes of the item, then you really need to use UpdateItem like you did. You can either use the UpdateExpression syntax as you did - which isn't too hard, honestly - but there is also an older syntax AttributeUpdates which you may like better because you just need to define the attributes you want to replace - without preparing an "expression". The AttributeUpdates syntax is older and considered "legacy" but still works, and many applications use it so I doubt it will go away any time soon.

Firebase - Targeting Specific Firestore Document Field with Cloud Functions

When using Cloud Functions with the Firebase Realtime Database you are able to target a specific field with a cloud function. For example, given this JSON structure, I could target when user1's email field changed with a ('/user/{userId}/email').onUpdate cloud function.
{
"user": {
"user1": {
"name": "John Doe",
"phone": "555-5555",
"email": "john#gmail.com"
}
}
}
Now with Firestore it seems that I can't target a specific field and can only target documents. If this is the case a Firestore cloud function to target the email field would have to look like this, ('/user/{userId}').onUpdate, and fire every time any document in the user collection changed. This would result in a lot of wasted cloud functions firing. Is this how Firestore works and is there an elegant work around to it?
You are correct, due to the different data models Cloud Firestore only allows you to trigger Cloud Functions on document level events rather than field level.
One method is to store email in a separate document (e.g. in a subcollection called Email), so updating email is tye only change that will fire. This does require you reading an extra document each time you need the email though.
Another similar method is to still have it in the same document, but also write it into the subcollection as a second write to trigger the function. Use email as the doc I'd and have a timestamp field in the document to make it easy to clean up the old document ( select oldest email doc to delete, maybe even in the function)

How to develop form interfaces for schema-less databases?

If you are using an SQL database, it's very straightforward to develop a user interface for CRUD operations. Since the schema is defined, it's obvious how many inputs you need in a form, etc.
But when using a schema-less NoSQL approach for storage, how do you build interfaces since you don't know exactly what to expect from the types of data being stored? For example if you had a database of cars:
var cars = [
{ model: "BMW", color: "Red", manufactured: 2016 },
{ model: "Mercedes", type: "Coupe", color: "Black", manufactured: “1-1-2017” }
];
If you needed to create a user interface so you could access and edit this data, you have no clue how many inputs you need since there is no schema. How do UI developers solve this problem?
Do you have a bunch of if statements to test if every possible attribute exists in the record before showing the proper inputs?
// psuedo code
if ($car.hasKey("model") ) {
// Show the "Model" input form element
}
if ($car.hasKey("type") ) {
// Show the "Type" input form element
}
if ($car.hasKey("color") ) {
// Show the "Color" input form element
}
if ($car.hasKey("manufactured") ) {
// Show the "Manufactured" input form element
}
If you needed to create a user interface so you could access and edit this data, you have no clue how many inputs you need since there is no schema. How do UI developers solve this problem?
You solve this by reasoning from feature requirements. Emphatically, you do not try to generate forms (automatically or otherwise) from schemas: that is a recipe for poor UX even if you do have a comprehensive, canonical and unequivocal schema to hand.
Instead: you know what your 'use cases' are (you ask users) and then you build exactly those.
So the question becomes:
What do you do when your data item/instance does not contain a particular object/field/key which you did expect?
What do you do when your instance contains fields which you do not know what to do with?
The answer for #1 is pretty straightforward, and basically just the same as dealing with schema changes: assume/present sane defaults or handle null values gracefully. That is: permit your users to add such fields later where they make sense and do not choke on objects that lack them.
The answer for #2 is more complicated and it is going to depend heavily on the application and its environment (for example: is it the sole consumer of the data, or are there other consumers to consider as well). One option could be normalisation: you prune such extraneous fields on saving, so objects become normalised over time as they are updated by the users. An alternative could be preservation: you keep any fields you do not know as-is, and you take pains to preserve them through every layer of your application.

How to order my chat groups by last update time when my data is denormalized (Firebase)?

I'm building a chat app with Firebase (and AngularJS) and I have a data structure that is similar to the one on this Firebase documentation page. This structure is good for not having to retrieve huge amounts of unneeded data but I don't seem to understand a very simple thing.
With my data looking like below, when a user connects to my app:
How do I retrieve their 10 most recently updated groups and keep this list updated as new messages are posted in groups?
// An index to track Ada's memberships
{
"users": {
"alovelace": {
"name": "Ada Lovelace",
// Index Ada's groups in her profile
"groups": {
// the value here doesn't matter, just that the key exists
"techpioneers": true,
"womentechmakers": true
}
},
...
},
"groups": {
"techpioneers": {
"name": "Historical Tech Pioneers",
"members": {
"alovelace": true,
"ghopper": true,
"eclarke": true
},
"lastUpdateTime": [SOME TIMESTAMP HERE]
},
...
}
}
More information if you care to read
As you can see, I've added "lastUpdateTime": [SOME TIMESTAMP HERE] to the code above because it's how I do it for my app. I can't figure out what should be the "refresh process" for a user group list.
If my user has 100 groups, should I retrieve a list of the group IDs and get the actual groups one by one to be able to only keep the 10 most recent (I'm pretty sure this is not the way to go)?
Also, whenever someone posts a message in a group, it will update the lastUpdateTime in Firebase but how do I keep the user group list synchronized to this?
I've tried some very ugly combinations of child events, orderBys as well as entire chains of functions executing whenever something fires but it doesn't work and seems extremely complicated, for nothing. The whole idea of flattening the data is to keep the queries/fetching to a minimum and I feel that what I have done so far is way too heavy.
To show the list of the 10 most recently updated groups:
ref.child("groups").orderByChild("lastUpdateTime").limitToLast(10)
If you use this approach, please flatten your data further, since the query will now end up retrieving the members of each group, which is not needed for displaying a list of groups.
If you want to a list of the groups the user is subscribed to by order of the last update, you have a few options:
store the last update timestamp for each user's subscriptions
load the user's groups and re-order them client-side
store the last update timestamp for each user's subscriptions
Store the timestamp the group was last updated for each user subscribed to the group:
"usersgroups": {
"alovelace": {
// the value is the timestamp the group was last updated
"techpioneers": 14952982198532978,
"womentechmakers": 14852982198532979
},
You'll note that I split the group memberships from the user profiles here, since you shouldn't nest such loosely related data.
Then you can get the list of the user's group in the correct order with:
ref.child("usersgroups/alovelace").orderByValue()
The main problem with this approach is that you'll need to update the timestamp of a group for all members for ever post. So writes become a lot more expensive.
load the user's groups and re-order them client-side
This may sound like it'll be slower, but it actually won't be too bad. Since you're only loading the groups the user is a member of, the number won't be too high. And Firebase pipelines the requests, so performance is a lot better than you may expect. See Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly

Resources