Structuring my data - Firebase

Structuring my data - Firebase - database

I'm creating a prototype group list application. I want the following objects:
User
List
Item
Comment
I think that I should structure this as follows:
http://myapp.firebase.io/user/
http://myapp.firebase.io/user/uid/lists/
http://myapp.firebase.io/list/
http://myapp.firebase.io/item/listid/
http://myapp.firebase.io/comment/itemid
where http://myapp.firebase.io/user/uid/lists/ points to list unique id's, http://myapp.firebase.io/item/listid/ points to all item objects for a given list, and http://myapp.firebase.io/comment/itemid points to all comments for a given item.
Does this structure make sense? The reason I did it this way instead of nesting further (i.e. http://myapp.firebase.io/list/listid/item/ for items and http://myapp.firebase.io/list/listid/item/itemid/comment for comments) is because it says in the documentation that whenever you fetch an object you fetch all children. Sometimes (perhaps even most of the time) I want to fetch a list's items, but not each item's comments. I might only want to do that when a user clicks on the item.

In a NoSQL database you should model your data for how you intend to use it. I highly recommend reading this article on NoSQL data modeling.
The top-level structure seems fine and does not violate Firebase's recommendation to limit nesting of data. But there are many other places where you might still make mistakes (which is one of the reasons this question is a bit too broad for Stack Overflow, but I'll try to answer some of it anyway).
I'd separate out the user's lists into a separate top-level node:
/userlists/$uid/$listid
That way the /users/$uid nodes would just contain the user's profile information and you could cheaply show a list of users. You might even consider splitting the most visible aspect of the user profile into another top-level node, to make the showing of such a list even cheaper.
/usernames/$uid
You'll be duplicating data in this case. But storage is (relatively) cheap, and optimizing for the more common reading of data is one of the reasons NoSQL databases can scale so well.
As you may notice, I focus on showing a list of user names, retrieving the lists for a user and accessing the profile for a specific user. These are use-cases and we're modeling the data to fit them.
In a NoSQL database you should model your data for how your app accesses it. I highly recommend reading this article on NoSQL data modeling.
After that, write out your list of use-cases and see how you can most easily access the data for it. Liberally denormalize and occasionally duplicate the data, to fit the use-cases. Use multi-location updates to keep denormalized and duplicated data in sync with its main entity.

Related

Arbitrary document ordering in CouchDB/PouchDB

I’m building what can be treated as a slideshow app with CouchDB/PouchDB: each “slide” is its own Couch document, and slides can be reordered or deleted, and new slides can be added in between existing slides or at the beginning or end of the slideshow. A slideshow could grow from one to ≲10,000 slides, so I am sensitive to space- and time-efficiency.
I made the slide creation/editing functionality first, completely underestimating how tricky it is to keep track of slide ordering. This is hard because the order of each slide-document is completely independent of the slide-doc itself, i.e., it’s not something I can sort by time or some number contained in the document. I see numerous questions on StackOverflow about how to keep track of ordering in relational databases:
Efficient way to store reorderable items in a database
What would be the best way to store records order in SQL
How can I reorder rows in sql database
Storing item positions (for ordering) in a database efficiently
How to keep ordering of records in a database table
Linked List in SQL
but all these involve either
using a floating-point secondary key for reordering/creation/deletion, with periodic normalization of indexes (i.e., imagine two documents are order-index 1.0 and 2.0, then a third document in between gets key 1.5, then a fourth gets 1.25, …, until ~31 docs are inserted in between and you get floating-point accuracy problems);
a linked list approach where a slide-document has a previous and next field containing the primary key of the documents on either side of it;
a very straightforward approach of updating all documents for each document reordering/insertion/deletion.
None of these are appropriate for CouchDB: #1 incurs a huge amount of incidental complexity in SQL or CouchDB. #2 is unreliable due to lack of atomic transactions (CouchDB might update the previous document with its new next but another client might have updated the new next document meanwhile, so updating the new next document will fail with 409, and your linked list is left in an inconsistent state). For the same reason, #3 is completely unworkable.
One CouchDB-oriented approach I’m evaluating would create a document that just contains the ordering of the slides: it might contain a primary-key-to-order-number hash object as well as an array that converts order-number-to-primary-key, and just update this object when slides are reordered/inserted/deleted. The downside to this is that Couch will keep a copy of this potentially large document for every order change (reorder/insert/delete)—CouchDB doesn’t support compacting just a single document, and I don’t want to run compaction on my entire database since I love preserving the history of each slide-document. Another downside is that after thousands of slides, each change to ordering involves transmitting the entire object (hundreds of kilobytes) from PouchDB/client to Couch.
A tweak to this approach would be to make a second database just to hold this ordering document and turn on auto-compaction on it. It’ll be more work to keep track of two database connections, and I’ll eventually have to put a lot of data down the wire, but I’ll have a robust way to order documents in CouchDB.
So my questions are: how do CouchDB people usually store the order of documents? And can more experienced CouchDB people see any flaws in my approach outlined above?

Thanks to a tip by #LynHeadley, I wound up writing a library that could subdivide the lexicographical interval between strings: Mudder.js. This allows me to infinitely insert and move around documents in CouchDB, by creating new keys at will, without any overhead of a secondary document to store the ordering. I think this is the right way to solve this problem!

Based on what I've read, I would choose the "ordering document" approach. (ie: slideshow document that has an array of ids for each slide document) This is really straightforward and accomplishes the use-case, so I wouldn't let these concerns get in the way of clean/intuitive code.
You are right that this document can grow potentially very large, compounded by the write-heavy nature of that specific document. This is why compaction exists and is the solution here, so you should not fight against CouchDB on this point.
It is a common misconception that you can use CouchDB's revision history to keep a comprehensive history to your database. The revisions are merely there to aid in write concurrency, not as a full version control system.
CouchDB has auto-compaction enabled by default, and without it your database will grow in size unchecked. Thus, you should abandon the idea of tracking document history using this approach, and instead adopt another, safer alternative. (a list of these alternatives is beyond the scope of this answer)

Bi-Directional Relationships in Backendless - Working with highly interrelated data

This is a fundamental novice level question that will not be short. This is specific to Backendless.
I have a number of scenarios I would like to be able to address, as I am working with a small set of tables that are all interrelated in some form and need to be explored from various directions.
A basic example would be something like PersonTable and AddressTable. PersonTable containing a list of people, with their lastName, firstName, etc. AddressTable containing addresses and their various attributes of streetName, houseNumber, etc.
Let's say I want to provide users two distinct views in a main navigation and allow them to drill down further.
View1: You click "People", you get a list of people from the PersonTable. This list appears in a secondary navigation window. Clicking an individual person will provide you the address/addresses associated with that person.
However, I also want to be able to do this in reverse:
View2: You click "Address", you get a list of addresses from the AddressTable. This list appears in a secondary navigation window. Clicking an individual address will provide you with a person/people associated with that address.
So from a uni-directional approach, there would be a relationship from PeopleTable to AddressTable. This is perfectly well and good for View 1. One query will provide the data for the secondary navigation and the results from that query can include the relationship data needed for the drill down.
However, if I wanted to support View 2, I would have to perform two queries given the direction of the relationship and where I am starting.
If you scale this to a larger set of data with more tables and fields, my concern might become more apparent. Because I want to actually provide some data from the parent of the relationship in the initial secondary navigation item creation. So that means an initial query of that table to list the items, and a query for each individual item (to obtain the data I need from it's parent in the relationship) to complete the data shown in the initial list. (Then clicking an item would provide even more detail). Obviously this relationship can be reversed, and I would then be pulling child data and not parent data, but then when I want to come at the data from the other direction (the other View) I am in the same situation again.
TL;DR: I need to be able to traverse tables in pretty much any direction and drill into data while attempting to minimize the number of queries required to do so for any given case. Is this a case where a large number of relationships is warranted?
Getting to the root of the question: My understanding is that, while Backendless does support them, bi-directional relationships are generally frowned upon (at least in the SQL world).
So, really, what is best practice? Is it simply a logical "Create relationships when they help you reduce queries"?

Bidirectional is frowned upon here too, though it does work. You may find a few bugs as it isn't used much.
The reason is that it isn't required, you already know you can make a request to get the inverse content.
But, the reason you should not use them is that auto-loading all of that extra data when you might not use it is more costly than making explicit requests when you do...
Also, you can limit your query impact in terms of network traffic by creating a custom service which does all the leg work.

However, if I wanted to support View 2, I would have to perform two
queries given the direction of the relationship and where I am
starting.
Performing two queries is not necessarily in Backendless, as the query syntax supports "backward lookup". It means knowing a "child" object, you can lookup its parent using the following syntax of the "whereClause":
childRelation.objectId = 'childObjectId'
For example for your Person and Address tables, suppose the relation column in the Parent table is called "addresses" and it is a one-to-many relation. Then the query sent to the Person table is:
addresses.objectId = 'specific-objectId-value-from-Address'
Keep in mind that you can test your whereClause queries using Backendless console. Here's an article about that feature:
https://backendless.com/feature-14-sql-based-search-for-data-objects-using-console/
Hope this helps.

Database storage design of large amounts of heterogeneous data

Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up).
Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc.
Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has.
But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ?
Some attempts at a solution that I have seen, none of which I find satisfying:
Binary serialization of the items, the database just holds an ID and a blob.
Pro's: takes like 10 seconds to implement.
Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.
A table per item type.
Pro's: Clean, flexible.
Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.
One table with a lot of fields that aren't used by every item.
Pro's: takes like 10 seconds to implement, still searchable.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data.
Pro's: I've got nothing.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
What ideas do you have? Have you seen another design that works better or worse?

It depends if you need to sort, filter, count, or analyze those attribute.
If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.
The best option is to use Table Inheritance:
PRODUCT
id pk
type
att1
PRODUCT_X
id pk fk PRODUCT
att2
att3
PRODUCT_Y
id pk fk PRODUCT
att4
att 5
For attributes that you don't need to search/sort/analyze, then use a blob or xml

I have two alternatives for you:
One table for the base type and supplemental tables for each “class” of specialized types.
In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.
PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.
CONS: You need two inserts for many objects.
Use a “kind” enum field and a JSONB-like field for the special type data.
This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.
PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.
CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.

Yes, it is a pain to design database formats like this. I'm designing a notification system and reached the same problem. My notification system is however less complex than yours - the data it holds is at most ids and usernames. My current solution is a mix of 1 and 3 - I serialize data that is different from every notification, and use a column for the 2 usernames (some may have 2 or 1). I shy away from method 2 because I hate that design, but it's probably just me.
However, if you can afford it, I would suggest thinking outside the realm of RDBMS - it sounds like Non-RDBMS (especially key/value storage ones) may be a better fit to store these data, especially if item 1 and item 2 differ from each item a lot.

I'm sure this has been asked here a million times before, but in addition to the options which you have discussed in your question, you can look at EAV schema which is very flexible, but which has its own sets of cons.
Another alternative is database systems which are not relational. There are object databases as well as various key/value stores and document databases.
Typically all these things break down to some extent when you need to query against the flexible attributes. This is kind of an intrinsic problem, however. Conceptually, what does it really mean to query things accurately which are unstructured?

First of all, do you actually need the concurrency, scalability and ACID transactions of a real database? Unless you are building a MMO, your game structures will likely fit in memory anyway, so you can search and otherwise manipulate them there directly. In a scenario like this, the "database" is just a store for serialized objects, and you can replace it with the file system.
If you conclude that you do (need a database), then the key is in figuring out what "atomicity" means from the perspective of the data management.
For example, if a game item has a bunch of attributes, but none of these attributes are manipulated individually at the database level (even though they could well be at the application level), then it can be considered as "atomic" from the data management perspective. OTOH, if the item needs to be searched on some of these attributes, then you'll need a good way to index them in the database, which typically means they'll have to be separate fields.
Once you have identified attributes that should be "visible" versus the attributes that should be "invisible" from the database perspective, serialize the latter to BLOBs (or whatever), then forget about them and concentrate on structuring the former.
That's where the fun starts and you'll probably need to use "all of the above" strategy for reasonable results.
BTW, some databases support "deep" indexes that can go into heterogeneous data structures. For example, take a look at Oracle's XMLIndex, though I doubt you'll use Oracle for a game.

You seem to be trying to solve this for a gaming context, so maybe you could consider a component-based approach.
I have to say that I personally haven't tried this yet, but I've been looking into it for a while and it seems to me something similar could be applied.
The idea would be that all the entities in your game would basically be a bag of components. These components can be Position, Energy or for your inventory case, Collectable, for example. Then, for this Collectable component you can add custom fields such as category, numItems, etc.
When you're going to render the inventory, you can simply query your entity system for items that have the Collectable component.
How can you save this into a DB? You can define the components independently in their own table and then for the entities (each in their own table as well) you would add a "Components" column which would hold an array of IDs referencing these components. These IDs would effectively be like foreign keys, though I'm aware that this is not exactly how you can model things in relational databases, but you get the idea.
Then, when you load the entities and their components at runtime, based on the component being loaded you can set the corresponding flag in their bag of components so that you know which components this entity has, and they'll then become queryable.
Here's an interesting read about component-based entity systems.

How to handle "reference data" (static data) in the Google App Engine datastore?

I have an application I am working on where I have a set of data that, while not technically static, will not change very often (say, 3 or 4 times a year on average). However, some of this data is interrelated.
An example of this type of data would be states and counties - ideally, we would like to know all of the states available when putting in an address or location, but we would also like to know the counties available for each state, so we can display that information appropriately to the user (i.e. filtering out the inappropriate counties when a user has a state selected).
In the past, I have done this in a relational database by having a state and county table, where the county is linked back to the state it belongs in, and the state and counties are linked to any tables that need their information.
This data is not owned however, and in the Google datastore it seems like the locking transaction mechanism will cause locks to occur even though we are not actively modifying this data. What is the best way to handle this type of data? Is it to have an entity for the pieces that does not have a parent (parent of None/null)? Will this cause locking problems in the future?

I'd consider storing this in an optimized data structure inside your code and updating it manually. The performance gain will be huge, and since google charges you for that, you will end up thanking for it.
The idea is to mix this fixed data structures with your database, so you give each country (or whatever) an id, and you reference it in your models.
A simple approach is making a list of countries and each have a list of states in them. You can load them in def main():, before you run the app. Of course this will bring all sorts of problems if you are not careful, but if you are, you should be fine.
A more advanced one would be to keep in memory only the most used, and lazy load and dump countries on the fly.

Is it advisable to store things such as list of cities on the db?

Hi I'm using CakePHP and I'm wondering if it's advisable to store things that don't change a lot in the database lik the list of cities?

If your application already needs a database, why would you keep data anywhere else?
If the list doesn't change (per installation) and it's reasonably small and frequently used, then it might be worth reading it once on initialization and caching the result to improve performance and reduce the load on the database.

You get all sorts of queries and retrievals out of the box, the same way you access any other of your data. Databases are as cheap as flat files today, but you get a full service.

I see this question has had an answer accepted - I still want to chime in with my $0.02
The way I typically do for arrays of static data (country list, timezone list, immutable sets you would use enum for...) is to use this array datasource.
It allows you to map relationships between db models and array based models and to use the usual find syntax / Containable on the relationships.
http://github.com/jrbasso/array_datasource

If it is pretty much a static list, then you can store it either in the db or a file, but keep it in memory for use. In other words, load it once whether from db or file. What you don't want to do is keep taking a hit loading it. Especially if you use it on most page views. Those little bits of time add up if you have a large number of visitors.
The flip side, of course, is if you find yourself doing this for large lists or lots and lots of little lists. Then you could run into problems of keeping too much in memory.
Bill the Lizard is right about it being important whether or not the list links to other tables. If it does, then you will need it in the db if you need queries that will include it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight