Querying Nested Documents in Couchbase - database

I am writing a Reddit like comment store using Couchbase. For each comment, I am storing its parentId and a list of its childrenIds. Each top-level comment on a web page will have its parentId as null.
I want to retrieve a comment block efficiently. By a comment block, I mean a top-level comment along with all its children comments. So the first step in this can be to write a map function that emits the ids of all top-level comments.
How do I go about fetching the entire tree once I have the root. A very naive approach would be to find the children, and query them recursively. But this defeats the purpose of not using a relational database for this project (since I am dealing with highly nested data and relational databases are terrible at storing them).
Can someone guide me on this?

OK, so each top-level comment has a tree of sub-comments below it. I think you could safely put the entire tree of sub-comments in the document of the top-level comment in most cases. The default limit on document sizes is 20 MB, which is a heck of a lot for text.
The question then is what to do with comments that inspire a LOT of sub-comments. I suggest you start spilling parts of the sub-comment tree to other documents when that happens, so strictly speaking there can be a tree of sub-comment trees, although typically you only need the one document. Design things so these subsidiary documents only get fetched on demand, and you never have to fetch absolutely the entire sub-comment tree.

Related

Perform operations directly on database (esp. Firestore)

Just a question regarding NoSQL DB. As far as I know, operations are done by the app/website outside the DB. For instance, if I need to add an value to a list, I need to
download the intial list
add the new value in the list on my device
upload the whole updated list.
At the end, a lot of data is travelling (twice the initial list) with no added value.
Is there any way to request directly the DB for simple operations like this?
db.collection("collection_key").document("document_key").add("mylist", value)
Or simply increment a field?
Same for knowing the number of documents in a collection: is it needed to download the whole set of document to get the number ?
Couple different answers:
In Firestore, many intrinsic operations can be done "FieldValues", such as increment/decrement (by supplied value, so really Add/subtract). Also array unions, field deletes, etc. Just search the documentation for FieldValue. Whether this is true for NoSQL in general, I can't say.
Knowing the number of documents, on the other hand. is not trivially done in Firestore - but frankly, I can't think of any situations other than artificially contrived examples where you would need to know. Easy enough to setup ways to "count" documents as you create/delete them, and keep that separately, if for some reason you find yourself needing it.
Or were you just trying to generically put down NoSQL as a concept?

Arbitrary document ordering in CouchDB/PouchDB

I’m building what can be treated as a slideshow app with CouchDB/PouchDB: each “slide” is its own Couch document, and slides can be reordered or deleted, and new slides can be added in between existing slides or at the beginning or end of the slideshow. A slideshow could grow from one to ≲10,000 slides, so I am sensitive to space- and time-efficiency.
I made the slide creation/editing functionality first, completely underestimating how tricky it is to keep track of slide ordering. This is hard because the order of each slide-document is completely independent of the slide-doc itself, i.e., it’s not something I can sort by time or some number contained in the document. I see numerous questions on StackOverflow about how to keep track of ordering in relational databases:
Efficient way to store reorderable items in a database
What would be the best way to store records order in SQL
How can I reorder rows in sql database
Storing item positions (for ordering) in a database efficiently
How to keep ordering of records in a database table
Linked List in SQL
but all these involve either
using a floating-point secondary key for reordering/creation/deletion, with periodic normalization of indexes (i.e., imagine two documents are order-index 1.0 and 2.0, then a third document in between gets key 1.5, then a fourth gets 1.25, …, until ~31 docs are inserted in between and you get floating-point accuracy problems);
a linked list approach where a slide-document has a previous and next field containing the primary key of the documents on either side of it;
a very straightforward approach of updating all documents for each document reordering/insertion/deletion.
None of these are appropriate for CouchDB: #1 incurs a huge amount of incidental complexity in SQL or CouchDB. #2 is unreliable due to lack of atomic transactions (CouchDB might update the previous document with its new next but another client might have updated the new next document meanwhile, so updating the new next document will fail with 409, and your linked list is left in an inconsistent state). For the same reason, #3 is completely unworkable.
One CouchDB-oriented approach I’m evaluating would create a document that just contains the ordering of the slides: it might contain a primary-key-to-order-number hash object as well as an array that converts order-number-to-primary-key, and just update this object when slides are reordered/inserted/deleted. The downside to this is that Couch will keep a copy of this potentially large document for every order change (reorder/insert/delete)—CouchDB doesn’t support compacting just a single document, and I don’t want to run compaction on my entire database since I love preserving the history of each slide-document. Another downside is that after thousands of slides, each change to ordering involves transmitting the entire object (hundreds of kilobytes) from PouchDB/client to Couch.
A tweak to this approach would be to make a second database just to hold this ordering document and turn on auto-compaction on it. It’ll be more work to keep track of two database connections, and I’ll eventually have to put a lot of data down the wire, but I’ll have a robust way to order documents in CouchDB.
So my questions are: how do CouchDB people usually store the order of documents? And can more experienced CouchDB people see any flaws in my approach outlined above?
Thanks to a tip by #LynHeadley, I wound up writing a library that could subdivide the lexicographical interval between strings: Mudder.js. This allows me to infinitely insert and move around documents in CouchDB, by creating new keys at will, without any overhead of a secondary document to store the ordering. I think this is the right way to solve this problem!
Based on what I've read, I would choose the "ordering document" approach. (ie: slideshow document that has an array of ids for each slide document) This is really straightforward and accomplishes the use-case, so I wouldn't let these concerns get in the way of clean/intuitive code.
You are right that this document can grow potentially very large, compounded by the write-heavy nature of that specific document. This is why compaction exists and is the solution here, so you should not fight against CouchDB on this point.
It is a common misconception that you can use CouchDB's revision history to keep a comprehensive history to your database. The revisions are merely there to aid in write concurrency, not as a full version control system.
CouchDB has auto-compaction enabled by default, and without it your database will grow in size unchecked. Thus, you should abandon the idea of tracking document history using this approach, and instead adopt another, safer alternative. (a list of these alternatives is beyond the scope of this answer)

Structuring my data - Firebase

I'm creating a prototype group list application. I want the following objects:
User
List
Item
Comment
I think that I should structure this as follows:
http://myapp.firebase.io/user/
http://myapp.firebase.io/user/uid/lists/
http://myapp.firebase.io/list/
http://myapp.firebase.io/item/listid/
http://myapp.firebase.io/comment/itemid
where http://myapp.firebase.io/user/uid/lists/ points to list unique id's, http://myapp.firebase.io/item/listid/ points to all item objects for a given list, and http://myapp.firebase.io/comment/itemid points to all comments for a given item.
Does this structure make sense? The reason I did it this way instead of nesting further (i.e. http://myapp.firebase.io/list/listid/item/ for items and http://myapp.firebase.io/list/listid/item/itemid/comment for comments) is because it says in the documentation that whenever you fetch an object you fetch all children. Sometimes (perhaps even most of the time) I want to fetch a list's items, but not each item's comments. I might only want to do that when a user clicks on the item.
In a NoSQL database you should model your data for how you intend to use it. I highly recommend reading this article on NoSQL data modeling.
The top-level structure seems fine and does not violate Firebase's recommendation to limit nesting of data. But there are many other places where you might still make mistakes (which is one of the reasons this question is a bit too broad for Stack Overflow, but I'll try to answer some of it anyway).
I'd separate out the user's lists into a separate top-level node:
/userlists/$uid/$listid
That way the /users/$uid nodes would just contain the user's profile information and you could cheaply show a list of users. You might even consider splitting the most visible aspect of the user profile into another top-level node, to make the showing of such a list even cheaper.
/usernames/$uid
You'll be duplicating data in this case. But storage is (relatively) cheap, and optimizing for the more common reading of data is one of the reasons NoSQL databases can scale so well.
As you may notice, I focus on showing a list of user names, retrieving the lists for a user and accessing the profile for a specific user. These are use-cases and we're modeling the data to fit them.
In a NoSQL database you should model your data for how your app accesses it. I highly recommend reading this article on NoSQL data modeling.
After that, write out your list of use-cases and see how you can most easily access the data for it. Liberally denormalize and occasionally duplicate the data, to fit the use-cases. Use multi-location updates to keep denormalized and duplicated data in sync with its main entity.

MongoDB vs SQL Server for storing recursive trees of data

I'm currently specing out a project that stored threaded comment trees.
For those of you unfamiliar with what I'm talking about I'll explain, basically every comment has a parent comment, rather than just belonging to a thread. Currently, I'm working on a relational SQL Server model of storing this data, simply because it's what I'm used to. It looks like so:
Id int --PK
ThreadId int --FK
UserId int --FK
ParentCommentId int --FK (relates back to Id)
Comment nvarchar(max)
Time datetime
What I do is select all of the comments by ThreadId, then in code, recursively build out my object tree. I'm also doing a join to get things like the User's name.
It just seems to me that maybe a document storage like MongoDB which is NoSql would be a better choice for this sort of model. But I don't know anything about it.
What would be the pitfalls if I do choose MongoDB?
If I'm storing it as a Document in MongoDB, would I have to include the User's name on each comment to prevent myself from having to pull up each user record by key, since it's not "relational"?
Do you have to aggressively cache "related" data on the objects you need them on when you're using MongoDB?
EDIT: I did find this arcticle about storing trees of information in MongoDB. Given that one of my requirements is the ability to list to a logged in user a list of his recent comments, I'm now strongly leaning towards just using SQL Server, because I don't think I'll be able to do anything clever with MongoDB that will result in real performance benefits. But I could be wrong. I'm really hoping an expert (or two) on the matter will chime in with more information.
The main advantage of storing hierarchical data in Mongo (and other document databases) is the ability to store multiple copies of the data in ways that make queries more efficient for different use cases. In your case, it would be extremely fast to retrieve the whole thread if it were stored as a hierarchical nested document, but you'd probably also want to store each comment un-nested or possibly in an array under the user's record to satisfy your 2nd requirement. Because of the arbitrary nesting, I don't think that Mongo would be able to effectively index your hierarchy by user ID.
As with all NoSQL stores, you get more benefit by being able to scale out to lots of data nodes, allowing for many simultaneous readers and writers.
Hope that helps

Django - Optional recursive relationship

I am trying to use Django to create a recursive relationship, which gives users a folder-like hierarchical structure in which to place resources.
What would be the best way to achieve this?
I know I could use treebeard or mptt to create a nested set but I have read that making changes to the tree structure (something that would be happening a lot in this case) can be quite an intensive operation as a lot of fields have to be updated.
On the other hand, I could folder model with a ForeignKey to self but how do I manage the top level folders with no foreign key value? Will Django complain if I just set this value to be NULL?
Any advice appreciated.
Thanks.
Treebeard actually supports three different tree implementations, just choose the one that will suite your needs.
Adjacency List (fast writes at the cost of slow reads)
Materialized Path (probably the fastest way of working with trees in SQL)
Nested Sets (very efficient reads at the cost of high maintenance on write/delete operations)
Docs are here: https://tabo.pe/projects/django-treebeard/docs/tip/

Resources