How to store directory / hierarchy / tree structure in the database?

How to store directory / hierarchy / tree structure in the database? - sql-server

How do i store a directory / hierarchy / tree structure in the database? Namely MSSQL Server.
#olavk: Doesn't look like you've seen my own answer. The way i use is way better than recursive queries :)
p.p.s. This is the way to go!

There are many ways to store hierarchies in SQL databases. Which one to choose depends on which DBMS product you use, and how the data will be used. As you have used the MSSQL2005 tag, I think you should start considering the "Adjacency List" model; if you find that it doesn't perform well for your application, then have a look at Vadim Tropashko's comparison which highlights differences between models with a focus on multiple performance characteristics.

If using Sql Server 2008 is an option: maybe you should check out new hierarchyid data type.

There also is the Nested-Set Model of Trees which has some advantages over the ParentID model. See http://www.evanpetersen.com/item/nested-sets.html and http://falsinsoft.blogspot.nl/2013/01/tree-in-sql-database-nested-set-model.html

This is more of a bookmark for me than a question, but it might help you too. I've used this article's approach to store a directory / tree structure in the database.
There are some useful code snippets in the article as well.
Hope this helps.
I'm not affiliated with that website in any way

I faced the similar problem with one of my projects. We had a huge hierarchy which will keep increasing forever.
I needed to traverse it fast and then finding the right group after some complex validations.
Rather than going to SQL Server and scratching my head how can I do it efficiently there when I knew that Recursive queries are the only viable solution. But do you really know if there is any optimization at all possible in Recursive Queries. Is there any guarantee that your hierarchy will not increase in future and one fine day you find out that your recursive queries are too slow to be used in production?
So, I decided to give a shot to Neo4J. It's a graph database with many useful algorithms in-built, amazingly fast traversal with decent documentation and examples.
Store the hierarchy in Neo4J and access hierarchy using a Thrift Service (or something else).
Yes you will have to write code which will integrate your SQL queries with Neo4J but you will have a scalable and more future-proof solution.
Hope you find this useful.

Are you using SQL Server 2005? Recursive queries make querying hierarchical data much more elegant.
Edit: I do think materialized paths are a bit of a hack. The path contain non-normalized redundant data, and you have to use triggers or something to keep them updated. Eg. if a node changes parent, the whole subtree have to have their paths updated. And subtree queries have to use some ugly substring matching rather than an elegant and fast join.

The question is similar to this question that was closed. I found answers to both questions very helpful in my pursuits, and they eventually led me to the MongoDB manual that presents 5 different ways to model tree structures:
https://docs.mongodb.com/manual/applications/data-models-tree-structures/
While MongoDB is not a relational database, the models presented are applicable to relational databases, as well as other formats such as JSON. You clearly need to figure out which model is right based on the pros/cons presented.
The author of this question found a solution that combined both the Parent and Materialized Paths models. Maintaining the depth and parent could present some problems (extra logic, performance), but there are clearly upsides for certain needs. For my project, Materialized Paths will work best and I overcame some issues (sorting and path length) through techniques from this article.

The typical way is a table with a foreign key (e.g. "ParentId") onto itself.

Related

Representing treelike/networklike guide in a database

I am thinking about creating something like a tree- or networklike guide with Java objects with mapping it into a DB.
Following each step leads to another and so on, the following question/task/whatever depends on the former action. (see picture) It should be possible to create cycles for e. g. repeating some former steps.
What database should I prefer? Standard relational ones, connecting a table maybe with itself (foreign key -> primary key) to connect the nodes or some document-based (graph-based) like OrientDB, creating real trees? What about object-oriented databases like db4o?
What would have the better performance and/or be easier to realize?
Thanks in advance.
Additional thoughts:
I probably would add different actions (calls of webservices, whatsoever) and/or media (text, images, videos) to one node (step), leading to other steps, maybe getting back to a former one and so on.

I think you're on the right track with graph databases. OrientDB looks like it would work really well. Also of course I'll throw out Neo4j, which should work just as good.
A relational database in my opinion is easy to toss out because it doesn't map well to the problem space. You might be able to create good structures to store relationships etc., but the queries will be horrific and complex.
The most important question I would ask is, "What does the query language look like?" Storing data usually isn't that hard to do. You can design structures and load data just fine.
When you need to query and think about your data though ... is the query language easy to reason about? Does it represent the concepts that are important to you?
For graphs, I really like Gremlin. https://github.com/thinkaurelius/titan/wiki/Gremlin-Query-Language
It has a pretty natural syntax for conveying graph concepts. I'd check that out as an alternative to OrientDB.

Which database is best to work with graphs and tree structured data?

I am planning to work with Dapper.NET for a family site.
A lot of tree like data will be present in the structure.
Which database provides the best queries to work with cyclic/acyclic tree relations?
I want to know the easiness & performance comparison of hierarchical queries.
ie. like CTE in SQL Server, Connect By/Start with in Oracle etc..
Is dapper be the best choice as a Micro ORM for these kind of tree structured data?
I need opinion in choosing the right database and right Micro ORM for this.
Sorry for my bad English.

My question still stands: How much data do you expect?
But apart from that it's not just the type of database you're choosing for your data it's also table structure. Hierarchy trees can be stored in various different ways depending on your needs.
Table structure
Particular structures may be very fast on traversal reads but slow on inserts/updates (i.e. nested sets), others (adjacency lists) the other way around. For a 99:1 read:write ratio (vast majority of today's applications read much more than write) I would likely choose a modified nested set structure that has left, right, depth and parent. This gives you best possibility for read scenarios.
Database type
Unless you're aiming at huge amounts of data I suggest you go with any of the SQL databases that you know best (MSSQL, MySQL, Oracle). But if your database will contain enormous number of hierarchy nodes then flirting with a specialised graph-oriented database may be a better option.
80 million nodes
If you'd be opting for a modified nested set solution (also using negative values, so number of updates on insert/update halves) you'd have hierarchy table having left. right, ID and ParentID columns that would result in approx 1.2 GB table. But that's your top estimation after at least two years of usage.
My suggestion
Go quick & go light - Don't overengineer by using best possible database to store your hierarchy if it turns out it's not needed after all. Therefore I'd suggest you use relational DB initially so you can get on the market quickly even though solution will start to struggle after some millions of records. But before your database starts to struggle (we're talking years here) you'll gain two things:
You'll see whether your product will take off in the first place (there's many genealogy services already) so you won't invest in learning new technology; Because you'd be using proven and supported technology would get you on the market quickly
If your product does succeed (and I genuinely hope it does) it will still give you enough time to learn a different storage solution and implement it; with proper code layers it shouldn't be to hard to switch storage later on when required

What type of NoSQL database is best suited to store hierarchical data?

What type of NoSQL database is best suited to store hierarchical data?
Say for example I want to store posts of a forum with a tree structure:
original post
+ re: original post
+ re: original post
+ re2: original post
+ re3: original post
+ re2: original post

MongoDB and CouchDB offer solutions, but not built in functionality. See this SO question on representing hierarchy in a relational database as most other NoSQL solutions I've seen are similar in this regard; where you have to write your own algorithms for recalculating that information as nodes are added, deleted and moved. Generally speaking you're making a decision between fast read times (e.g. nested set) or fast write times (adjacency list). See aforementioned SO question for more options along these lines - the flat table approach appears most aligned with your question.
One standard that does abstract away these considerations is the Java Content Repository (JCR), both Apache JackRabbit and JBoss eXo are implementations. Note, behind the scenes both are still doing some sort of algorithmic calculations to maintain hierarchy as described above. In addition, the JCR also handles permissions, file storage, and several other aspects - so it may be overkill for your project.

What you possibly need is a document-oriented database like MongoDB or CouchDB.
See examples of different techniques which allow you to store hierarchical data in MongoDB:
http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

The most common one is IBM's IMS.There is also Cache Database
See this question posted on dba section of stackexchange.

Faced with the same issue, I decided to create my own (very simple) solution using Lua + Redis https://github.com/qbolec/Redis-Tree/

Exist-db implemented hierarchical data model for xml persistence

Graph databases would probably also solve this problem. If neo4j is not enough for you in terms of scaling, consider Titan, which is based on various storage back-ends including HBase and should scale very well. It is not as mature as neo4j, but it is a very promising project.

LDAP, obviously. OpenLDAP would make short work of it.

In mathematics, and, more specifically, in graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path. So any graph db will do the job for sure. BTW an ordinary graph like a tree can be simply mapped to any relational or non-relational DB. To store hierarchical data into a relational db take a look at this awesome presentation by Bill Karwin. There are also ORMs with facilities to store trees. For example TypeORM supports the Adjacency list and Closure table patterns for storing hierarchical structures.
TypeORM is used in TypeScript\Javascript development. Check popular ORMs to find a one supporting trees based on your environment.
The king of Non-relational DBs [IMHO] is Mongodb. Check out it's documentation. to find out how it stores trees. Trees are the most common kind of graphs and they are used everywhere. Any well-established DB solution should have a way to deal with trees.

Here's a non-answer for you. SQLServer 2008!!!! It's great for recursive queries. Or you can go the old fashioned route and store hierarchy data in a separate table to avoid recursion.
I think relational databases lend themselves very well to tree data. Both in query performance and ease of use. With one caveat.... you will be inserting into an indexed table, and probably several other indexed tables every time someone makes a post. Insert performance could be an issue on a facebook caliber forum.

Check out MarkLogic. You can download a demo copy from the website. It is a database for unstructured data and falls under the NoSQL classification of databases. I know unstructured data is a pretty loaded term but just think of it as data that does not fit well in the rows and columns of a RDBMS (like hierarchical data).

Just spent the weekend at a training course using MUMUPS db as a back-end for a full stack javascript browser application development framework. Great stuff! I'd recommend GT.M distro of MUMPS under GPL. Or try http://sourceforge.net/projects/mumps/?source=recommended for vanilla MUMPS. Check out http://robtweed.wordpress.com/ for ewd.js js framework and more info on MUMPS.

A NoSql storage service with native support for hierarchical data is Amazon Web Service's Simple Storage Service (AWS S3). The path based keys are hierarchical by nature, and the blob values may be typed using attributes (mime type, e.g. application/json, text/csv, etc.). Advantages of S3 include the ability to scale to both extremely large overall capacity, versioning, as well as nearly infinite concurrent writes. Disadvantages include no support for conditional writes (optimistic concurrency), or consistent reads (only for read-after write) and no support for references/relationships. It is also purely usage based so wide variations in demand do not require complex scaling infrastructure or over-provisioned capacity.

Clicknouse db has explicit support for hierarchical data

Can I use RavenDB (NoSQL) or should I just be using MySQL(RDBMS)?

I am starting on a ASP.NET MVC 3 General Management System (Project Management being the first component). Now I have been reading up a bit on RavenDB and it sounds pretty interesting. One of the biggest things that I like about it is the fact I would not need any type on ORM to handle the data from the DB. This will make my code a lot cleaner and quicker. However coming from a background working exclusively with MySQL for the past 6+ years, I tend to think very relationally with my data. There are a few things that seems like NoSQL would not be good for. I want to throw these things out there and maybe these issues can be handle in a NoSQL solution and I am just think too relationally (then again, maybe this project should be done with MySQL). These are the issues I am thinking of:
Unique Idenifiers: I am going to want to be able to have unique identifiers for a lot of things. For stuff like projects, the name should be unique and could use that however when it come to tasks under a project, the title may not be unique and this is where I would use a quto-increment field but I can do that in RavenDB (from what I can tell)
Linking: Using for fields like status and type I would just use a linking with a foreign key. Now for one-to-many relationships, I can just use the text instead of trying to link a foreign key (which you don't have in NoSQL) but with many-to-many linking, that because a problem. For example, I intend to have a tagging system (like on here) where most items can have 1 to many tags attached to it and then I can perform searches on those tag for the items. Is there a way to do this in NoSQL?
Is a RDBMS really the best tool for the job here or am I just not properly think the "NoSQL" way and I can accomplish this with NoSQL (RavenDB)?

I know this is an old post. Perhaps the docs weren't as good when originally written. But for reference in case other stumble here:
Raven comes with a HiLo document id generation strategy by default. Storing a new document without specifying an id yourself will get an auto incrementing id such as "projects/1", "projects/2", etc. Read more here.
The best guidance on the different ways to handle document relationships is here in the documentation. For the situation you described, you don't really need a separate document at all. You can simply embed a string array of tag names into each item. Documents are not flat, they can be structured. And yes, you can still query on them.
Hopefully you've discovered this on your own since the original post.

Ayende wrote a post "Modeling reference data in RavenDB" which answers some of your questions re Linking. You will have copies of the data between the reference document and your other documents and that redundancy is "ok" for document databases. You can still build indexes or query based on the on either Id or text that you store.
I would favor SQL for a transaction system such as Accounts Receivable application where you need to perform ad hoc queries. With document database you really need to think through how you will be fetching your data and build indexes up front to answers those questions. With RavenDB there is also a dynamic indexing function that learns from and caches the queries that are fired at the database.
For project management where the majority of items would be tasks I would think a RavenDB would fit your needs.

Is there a database like this?

Background: Okay, so I'm looking for what I guess is an object database. However, the (admittedly few) object databases that I've looked at have been simple persistence layers, and not full-blown DBMSs. I don't know if what I'm looking for is even considered an object database, so really any help in pointing me in the right direction would be very appreciated.
I don't want to give you two pages describing what I'm looking for so I'll use an example to illustrate my point. Let's say I have a "BlogPost" object that I need to store. Something like this, in pseudocode:
class BlogPost
title:String
body:String
author:User
tags:List<String>
comments:List<Comment>
(Assume Comment is its own class.)
Now, in a relational database, author would be stored as a foreign key pointing to a User.id, and the tags and comments would be stored as one-to-many or many-to-many relationships using a separate table to store the relationships. What I'd like is a database engine that does the following:
Stores related objects (author, tags, etc.) with a direct reference instead of using foreign keys, which require an additional lookup; in other words, objects on top of each other should be natively supported by the database
Allows me to add a comment or a tag to the blog post without retrieving the entire object, updating it, and then putting it back into the database (like a document-oriented database -- CouchDB being an example)
I guess what I'm looking for is a navigational database, but I don't know. Is there anything even remotely similar to what I'm thinking of? If so, what is it called? (Or better yet, give me an actual working database.) Or am I being too picky?
Edit:
Just to clarify, I am NOT looking for an ORM or an abstraction layer or anything like that. I am looking for an actual database that does this internally. Sorry if I'm being difficult, but I've searched and I couldn't find anything.
Edit:
Also, something for the JVM would be excellent, but at this point I really don't care what platform it runs on.

I think what you are describing could easily be modeled in a graph database. Then you get the benefit of navigating to the nodes/edges where you want to make changes without any need to retrieve anything else. For the JVM there's the Neo4j open source graph database (where I'm part of the team). You can read about it over at High Scalability, as part of an overview at thinkvitamin or in this stackoverflow thread. As for the tags, I think storing them in a graph database can give you some extra advantages if you want to find related tags and similar stuff. Just drop a line on the mailing list, and I'm sure the community will help you out.

You could try out db4o which is available in C# and Java.

I think our looking for this: http://www.odbms.org/. This site has some good info on Object Databases, including Objectivity, which is a pretty good object database.

Elephant does this: http://common-lisp.net/project/elephant/

Exactly what you've described can be done with (N)Hibernate running on an ordinary RDBMS.
The advantage of using such a persistence layer with an ordinary database is that you have a standard database system combined with convenient programming. You declare your classes in a very natural way, and (N)Hibernate provides a way to translate betweeen references/lists and foreign key relationships.
Java tutorial: http://docs.jboss.org/hibernate/stable/core/reference/en/html/tutorial-firstapp.html
.NET tutorial: https://web.archive.org/web/20081212181310/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/04/01/your-first-nhibernate-based-application.aspx
If you insist that you don't want to use a well-supported standard RDBMS and would rather trust your data to something more exotic and less heavily tested, you're looking for an Object Relational Database.
However, such a product would probably be best implemented by making it be a layer over a standard RDBMS anyway. This is probably why ORMs like (N)Hibernate are the most popular solution - they allow standard RDBMS software (and widely available management/user skills) to be applied, and yet the programming experience is 99% object-based.

This is exactly what LINQ was designed for.
Microsoft LINQ defines a set of proprietary query operators that can be used to query, project and filter data in arrays, enumerable classes, XML (XLINQ), relational database, and third party data sources. While it allows any data source to be queried, it requires that the data be encapsulated as objects. So, if the data source does not natively store data as objects, the data must be mapped to the object domain. Queries written using the query operators are executed either by the LINQ query processing engine or, via an extension mechanism, handed over to LINQ providers which either implement a separate query processing engine or translate to a different format to be executed on a separate data store (such as on a database server as SQL queries (DLINQ)). The results of a query are returned as a collection of in-memory objects that can be enumerated using a standard iterator function such as C#'s foreach.

There's a variety of terms, all linked to Object-Relational Mapping, aka ORM, which is probably going to be the most useful one for you to look up. ORM libraries exist for many programming languages.

Oracle's nested tables provide some part of that functionality, though in updates, you cannot just add a row to the nested table - you have to replace the whole nested table.

I guess you're looking for an ORM with "EntityFirst" approach.
In EntityFirst approach the developer is least[not-at-all] concerned with Database. You just have to build your entities or objects. The ORM then takes care of storing the entities in Database and retrieving them at your will.
The only EntityFirst ORM witihn my knowledge "Signum". It's a wonderful framework built on top of .net. I recommend you to go thrgouh some videos on the SignumFramework website and I'm sure you'll find it useful.
Link Text: http://www.signumframework.com
Thanks.

ZODB perhaps?
good introduction find here:
http://www.ibm.com/developerworks/aix/library/au-zodb/

You could try out STSdb, DB4O, Perst ... which is available in C# and Java.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight