Best way of structuring documents in Cloud Firestore (NoSQL databases)?

Best way of structuring documents in Cloud Firestore (NoSQL databases)? - database

I am trying to implement a Firestore Cloud DB but I am new to NoSQL databases.
I want to know whats the best way of arranging these sets into collections/documents:
I have restaurants which have different foods and reservations. What would be the best approach to structure these sets of data into Firestore DB?
Is this a right approach:
Restaurant1 (Collection)
----> Foods (document)
----> Reservations (document)

I think storing Foods and Reservations as top level collections will ultimately yield you more flexibility later on.
It's easy enough to take a restaurantID and stick it in the each document in those collections, so I don't personally think you should nest them in the Restaurants collection. It's a personal preference from working with a lot of nested collections.
I think the optimal structure is:
Restaurants (collection)
--- Name: Chipotle
--- ID: restaurant1
--- Foods: [{ Name: foodItem1 }, { Name: foodItem2 }]
Foods (collection)
--- Name: foodItem1
--- Ingredients: abc
--- Nutrition Facts: xyz
Reservations (collection)
--- User: user1
--- Restaurant: { id: restaurant1, name: Chipotle }
--- Time: 3pm
Users (collection)
--- ID: user1
What you'll notice is there is some redundant info. This is good so if you request all reservations, you'll get the restaurant's name and ID, etc. stuff you would likely want. You will find you'll want to store data multiple times and this to me feels like a good structure for it.
With this structure you can very easily just call:
All reservations by user X or All foods meeting nutrition limits of Y
Instead of a collectionGroup query which would call for all restaurants' sub collection list of reservations. You won't always want to query your reservations by restaurant, maybe you want to pull by user or time, etc.

According to your comment:
the restaurant´s management should be able to see his reservation together with other reservations from other clients, in a list. And each client should be able to see his history of reservations as well.
I'll try to provide you a schema that can help you get that data very easily.
Firestore-rrot
|
--- users (collection)
| |
| --- uid (document)
| | |
| | --- type: "manager"
| |
| --- uid (document)
| |
| --- type: "client"
|
--- reservations (collection)
|
--- reservationIdOne (document)
| |
| --- reservedBy: "uid"
| |
| --- date: September 21, 2019 at 1:15:02 PM UTC+3
|
--- reservationIdTwo (document)
|
--- reservedBy: "uid"
|
--- date: September 21, 2019 at 1:18:42 PM UTC+3
Using this schema, you can simply query the database to get all users or specific users (manager or clients). You can also get all reservations by adding a reference on reservations collection. If you want to get the reservation only of a single type (manager or client), you should use a query that might look like this:
db.collection("reservations").whereEqual("type", "manager");
As you can see, I have also added a date property so you can easily sort them descending (last reservation first).

It really depends on the use-case you are trying to solve, as you should be optimizing for specific queries on those models.
I recommend watching these videos to get a better idea:
Series of What is a NoSQL Database? How is Cloud Firestore structured?
Model Relational Data in Firestore NoSQL
Firestore Data Modeling - Five Cool Techniques

Related

Firestore historization and React Admin

I was looking to structure my Firestore data model as follows as suggested by this answer: Firestore: Version history of documents
Firestore-root
|
--- offers (collection)
|
--- offerHistoryId (document)
| |
| --- date: //last edited timestamp
| |
| --- //Offer details
|
--- offerHistoryId (document)
|
--- date: //last edited timestamp
|
--- //Offer details
I wanted to know if it was possible to implement the creation/querying of such database structure using this react-admin's data-provider for Firestore : https://github.com/benwinding/react-admin-firebase
Let me know if I need to provide any more details, thanks in advance!

Firebase Database Structure For Better Queries

I am working on an app that helps the user search for their desired medication in nearby pharmacies then shows a list of the pharmacies that have the drug in stock with prices.
I come from a SQL background and have been having a hard time deciding how to structure Firestore databases for better queries, I think normally you would go through the list of pharmacies and their databases but is there a way to have a single database for all the drugs with maybe a field that has the primary keys of the pharmacies that have it in stock in Firestore?

There are two ways in which you can solve this. The first one would be to create a sub-collection under each pharmacy to hold all available drugs:
Firestore-root
|
--- pharmacies (collection)
|
--- $pharmacyId (document)
|
--- drugs (sub-collection)
|
--- $drugId
|
--- name: "Aspirin"
|
--- inStock: true
To get all pharmacies that have, for example, Aspirin in stock, a collection group query is needed. In Android, the query should look like this:
db.collectionGroup("drugs").whereEqualTo("name", "Aspirin").whereEqualTo("inStock", "true");
The second option that you have is to create a single top-level collection of drugs:
Firestore-root
|
--- drugs (collection)
|
--- $drugId (document)
|
--- name: "Aspirin"
|
--- inStock: true
|
--- pharmacyId: $pharmacyId
And create a simple query that looks like this:
db.collection("drugs").whereEqualTo("name", "Aspirin").whereEqualTo("inStock", "true");
So you can choose to work or one or the other according to the use-case of your app.

How to structure products/orders in a firestore model?

So I'm creating a vue.js app and using Firebase as a back-end service. I have gotten some knowledge now on how one can store data in firestore but I'm asking what way would be the best for my case. At a certain point in the app, users can complete orders where each order has certain products. Every order will have an average of let's say 5/6 products, which are objects. Which of the following ways to strcture this data would you suggest?
orderedProducts as a top-level collection: here every document in this collection would have a reference to the order. Since I only need the products related to one order, I think this would be a bad choice. The relation is somewhat gone?
subcollection: here products can be a subcollection of orders. Nice hierarchy / structure but increases document reads compared to embedded array.
array: products is embedded as an array inside an order document. This is what I have now and is the easiest approach to create.
I thought, since the amount of products per order will be rather slim, I just use an array of products inside my order document. But ofcourse I might be wrong, or missing some important stuff.
I would appreciate any help/pointers on this real-life example on how to structure these products per order.
Thanks!

A possible schema for your app's use-case might be:
Firestore-root
|
--- merchants (collection)
| |
| --- merchantId (document)
| |
| --- //merchant details
|
--- orders (collection)
|
--- orderId (document)
|
--- merchantId: "LongMerchantId"
|
--- products (array)
|
--- 0
| |
| --- productName: "Bacon"
| |
| --- productPrice: 5
|
--- 1
|
--- productName: "Eggs"
|
--- productPrice: 12
Using this structure you can easily query for all orders of a single merchant:
db.collection("orders").whereEqualTo("merchantId", "LongMerchantId");
To get all the products of a particular order, simply use the following reference:
DocumentReference orderIdRef = db.collection("orders").document("orderId");
Now you can attach a listener on this reference, get the document and use the products list. You didn't specify the programming language that you are using so I gave you the examples in Android. It's very simple to transform them in the language you are using. For Android, to get a list of custom objects, please check the following article:
How to map an array of objects from Cloud Firestore to a List of objects?
One more thing to note is that the solution above will work only if you are sure that products in an order will fit in a 1 MiB (1,048,576 bytes). If you are not sure, instead of an array you should use a sub-collection. If you want to be 100% safe, you can always check against the maximum quota, using FirestoreDocument-Android library.

How to deal with Variable data over time in associations

In linked models (let's say a drink transaction, a waiter, and a restaurant), when you want to display data, you look for informations in your linked content :
Where was that beer bought ?
Fetch Drink transaction => Fetch its Waiter => Fetch this waiter's Restaurant : this is where the beer was purchased
So at time T, when I display all transactions, I fetch my data following associations, thus I can display this :
TransactionID Waiter Restaurant
1 Julius Caesar's palace
2 Cleo Moe's tavern
Let's say now that my waiter is moved to another restaurant.
If I refresh this table, the result will be
TransactionID Waiter Restaurant
1 Julius Moe's tavern
2 Cleo Moe's tavern
But we know that the transaction n°1 was made in Caesar's palace !
Solution 1
Don't modify the waiter Julius, but clone it.
Upside : I keep an association between models, and still can filter with every field of every associated models.
Downside : Every modification on every model duplicates content, which can do a LOT when time passes.
Solution 2
Keep a copy of the current state of your associated models when you create the transaction.
Upside : I don't duplicate the contents.
Downside : You can't anymore use fields on your content to display, sort or filter them, as your original and real data is inside, let's say, a JSON field. So you have to, if you use MySQL, filter your data by makin plain-search queries in that field.
What is your solution ?
[EDIT]
The problem goes further, as it's not only a matter when association changes : a simple modification on an associated model causes a problem too.
What I mean :
What's the amount of this order ?
Fetch Drink transaction => Fetch its product => Fetch this product's Price => Multiply by order quantity : this is the total amount of the order
So at time T, when I display all transactions, I fetch my data following associations, thus I can display this :
TransactionID Qty ProductId
1 2 1
ProductID Title Price
1 Beer 3
==> Amount of order n°1 : 6.
Let's say now that the beer costs 2,5.
If I refresh this table, the result will be
TransactionID Qty ProductId
1 2 1
ProductID Title Price
1 Beer 2,5
==> Amount of order n°1 : 5.
So, once again, the 2 solutions are available : do I clone the beer product when its price is changed ? Do I save a copy of beer in my order when the order is made ? Do you have any third solution ?
I can't just add an "amount" attribute on my orders : yes it can solve that problem (partially) but it's not a scalable solution as many other attributes will be in the same situation and I can't multiply attributes like this.

Event Sourcing
This is a good use case for Event Sourcing. Martin Fowler wrote a very good article about it, I advise you to read it.
there are times when we don't just want to see where we are, we also want to know how we got there.
The idea is to never overwrite data but instead create immutable transactions for everything you want to keep a history of. In your case you'll have WaiterRelocationEvents and PriceChangeEvents. You can recreate the status of any given time by applying every event in order.
If you don't use Event Sourcing, you lose information. Often it's acceptable to forget historic information, but sometimes it's not.
Lambda Architecture
As you don't want to recalculate everything on every single request, it's advisable to implement a Lambda Architecture. That architecture is often explained with BigData technology and frameworks, but you could implement it with Plain Old Java and CronJobs.
It consists of three parts: Batch Layer, Service Layer and Speed Layer.
The Batch Layer regularly calculates an aggregated version of the data, for example you'll calculate the monthly income once per day. So the current month's income will change every night until the month is over.
But now you want to know the income in real-time. Therefore you add a Speed Layer, which will apply all events of the current date immediately. Now if a request of the current month's income arrives, you'll add up the last result of the Batch Layer and the Speed Layer.
The Service Layer allows more advanced queries by combing multiple batch results and the Speed Layer results into one query. For example you can calculate the year's income by summing the monthly incomes.
But as said before, only use the Lambda approach if you need the data often and fast, because it adds extra complexity. Calculations which are rarely needed, should be run on-the-fly. For example: Which waiter creates the most income at Saturday evenings?
Example
Restaurants:
| Timestamp | Id | Name |
| ---------- | -- | --------------- |
| 2016-01-01 | 1 | Caesar's palace |
| 2016-11-01 | 2 | Moe's tavern |
Waiters:
| Timestamp | Id | Name | FirstRestaurant |
| ---------- | -- | -------- | --------------- |
| 2016-01-01 | 11 | Julius | 1 |
| 2016-11-01 | 12 | Cleo | 2 |
WaiterRelocationEvents:
| Timestamp | WaiterId | RestaurantId |
| ---------- | -------- | ------------ |
| 2016-06-01 | 11 | 2 |
Products:
| Timestamp | Id | Name | FirstPrice |
| ---------- | -- | -------- | ---------- |
| 2016-01-01 | 21 | Beer | 3.00 |
PriceChangeEvent:
| Timestamp | ProductId | NewPrice |
| ---------- | --------- | -------- |
| 2016-11-01 | 21 | 2.50 |
Orders:
| Timestamp | Id | ProductId | Quantity | WaiterId |
| ---------- | -- | --------- | -------- | -------- |
| 2016-06-14 | 31 | 21 | 2 | 11 |
Now let's get all information about order 31.
get order 31
get price of product 21 at 2016-06-14
get last PriceChangeEvent before the date or use FirstPrice if none exists
calculate total price by multiplying retrieved price with quantity
get waiter 11
get waiter's restaurant at 2016-06-14
get last WaiterRelocationEvent before the date or use FirstRestaurant if none exists
get restaurant name by retrieved restaurant id of the waiter
As you can see it becomes complicated, therefore you should only keep history of useful data.
I wouldn't involve the relocation events in the calculation. They could be stored, but I would store the restaurant id and the waiter id in the order directly.
The price history on the other hand could be interesting to check if orders went down after a price change. Here you could use the Lambda Architecure to calculate a full order with prices from the raw order and the price history.
Summary
Decide of which data you want to keep the history.
Implement Event Sourcing for that data.
Use the Lambda Architecture to speed up commonly used queries.

I like the question as it raises something very straightforward and also something more subtle.
The common principle in both cases is that ‘History must not change’, meaning if we run a query over a specified past date range today the results are the same as when we run that same query at any point in the future.
Waiters Case
When a waiter changes restaurants we must not change the history of sales. If waiter Julius sells a drink yesterday in restaurant 1 then he switches to sell more drinks today in restaurant 2 we must retain those details.
Thus we want to be able to answer queries such as ‘how many drinks has Julius sold in restaurant 1’ and ‘how many drinks has Julius sold in all restaurants’.
To achieve this you have to abstract away from Julius as a waiter by bringing in a concept of staff. Julius is a member of staff. Staff work as waiters. When working in restaurant 1 Julius is waiter A and when he works in another restaurant he is waiter B, but always the same member of staff – Julius. With an entity ‘Staff’ the queries can be answered easily.
Upside:
No loss of historic data or excessive duplications.
Downside New entity Staff must be managed. But waiter table content is reduced making net overhead of data storage is low.
In summary - abstract data subject to change into a new entity and refer back to it from transactions.
Value of Order Case
The extended use case regarding ‘what is the value of this order’ is more involved. I work in cross-currency transactions where value for the observer (user) in the price list changes from day to day as currency fluctuations occur.
But there are good reasons to lock the order value in place. For example invoice processing systems have tolerance for a small difference between their expected invoice value and that of the submitted invoice, but any large difference can lead to late payment whilst invoice handlers check the issue. Also, if customers run reports on their historic purchases then the values of those orders must remain consistent despite fluctuations in currency rates over time.
The solution is to save into the order line:
the value of product in the customers currency,
or the rate between custom and supplier currency,
but ideally do both to avoid rounding errors.
What this does is provide a statement that ‘on the date that this order was placed line 1 cost $44.56 at exchange rate 1.1 $/£’. Having this data locked in allows you to invoice to the customers expectation and provide consistent spend reports over time.
Upside: Consistent historic data. Fast database performance as no look-ups required against historic rate tables.
Downside: Some data duplication. However, trading off against overhead of storage and indexation for historic rate storage plus indexation then this is possibly an upside.
Regarding adding 'amount' to your order table - you have to do this if you want to achieve a consistent data history. If you only work in one currency then amount is the only additional storage concern. And by adding this one attribute you have protected history. Your other alternative is to store a historic cost table for drinks so you know in January beer was $1, in February it as $1.10 etc and then store the cost-table key in the transaction so that you can look up the cost if anyone asks about a historic order. But the overhead on storing the key PLUS the indexes needed to make this practicable will outweigh the storage cost of cloning 'amount' onto the order record.
In summary - clone cost data that will change over time.

Modelling a voting poll in a graph database

I've modelled a voting poll for a RDBMS system. The structure is a bit more complicated than a conventional voting poll since users can choose to vote either for an option on the poll or pass on their vote to another user for a given poll.
My structure looks something like this:
Polls
id | title
----------
1 | Who should be president
Options
id | poll_id | title
--------------------
1 | 1 | Obama
2 | 1 | Bush
Vote
id | poll_id | user_id | vote_type | vote_id
--------------------------------------------
1 | 1 | 1 | option | 1
2 | 1 | 2 | user | 1
In this case, option 1 would receive 2 votes since user 2 gave his vote to user 1 who votes for option 1.
I realize that the data I am going to store is going to be fairly complicated to query in a RDBMS system if I want to visualise how the votes move between users. However, I don't have much experience with graph databases and would like some hints as to how I go around modelling this.

It's always preferable, when making a DB model, to start with an information design model, and then transform this into a DB model.
In an information design model for your problem, options would be componenents of polls (so the UML class diagram would have a composition between Option and Poll), and votes would be relationships/links between users and options (so the UML class diagram would have a *many-to-many association between Option and User, the instances of which are the votes). In addition, there is a ternary association User-delegates-his-vote-in-Poll-to-User, the instances of which are the delegations.
From this, I get the following DB model:
Poll( id, question)
Option( poll_id, option_sequence_no, possible_vote)
Vote( user_id, poll_id, option_sequence_no, nmr_of_votes)
Delegation( user_id, poll_id, delegate_id)
Of course, we have to add a constraint that the number of votes by a use in a poll is the number of delegations plus 1.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight