Do graph databases deprecate relational databases? - database

I'm new to DBs of any kind. It seems you can represent any relational database in graph form (although it might be a very flat graph), and any graph database in a relational database (with enough tables).
A graph can avoid a lot of lookups in other tables by having a hard link from one entry to another, so in many/most cases I can see the speed advantage of a graph. If your data is naturally hierarchical, and especially if it forms a tree, I see the logical/reasoning benefit to a graph over relational. I imagine a node of a graph which links to other nodes probably contains multiple maps or lists... which is effectively containing a relational DB within nodes of a graph.
Are there any disadvantages to a graph db vs a relational db? (Note: I'm not looking to things like missing features in implementations, but instead the theoretical pros/cons)
When should I still use a relational database? Even if I logically have a single mapping of an int to int I could do it in a graph.

Graph databases were deprecated by relational-ish technology some 20 to 30 years ago.
The major theoretical disadvantage is that graph databases use TWO basic concepts to represent information (nodes and edges), whereas a relational database uses only one (the relation). This bleeds over into the language for data manipulation, in that a graph-based language must provide two distinct sets of operators : one for operating on nodes, and one for operating on edges. The relational model can suffice with only one.
More operators means more operators to implement for the DBMS builder, more opportunity for bugs, and for the user it means more distinct language constructs to learn. For example, adding information to a database is just INSERT in relational, in graph-based it can be either STORE (nodes) or CONNECT (edges). Removing information is just DELETE (relational), as opposed to either ERASE (nodes) or DISCONNECT (edges).

Building on Erwin Smout's fine answer, an important reason why the relational model supplanted the graph one is that a graph has a greater degree of "bias" baked into its structure than relations do. The edges of a graph are navigational links which user queries are expected to traverse in a particular way. A relational model of the same data assumes much less about how the data will be used. Users are free to join and manipulate relational data in ways that the database designer might not have foreseen. The disruptive costs of re-engineering graph database structures to support new requirements were a factor which drove the adoption of the relational model and its SQL-based offshoots in the 1980s.

Relational databases were designed to aggregate data, graph to find relations.
If You have for example a financial domain, all connections are known, You only aggregate data by other data to find sums and so on.
Graph databases are better in more chaotic domain where to connections are more important, and not all connections are apparent, for example:
networks of people, with different relations with one and other
films and people creating them. Not just actors but the whole crew.
natural language processing and finding connections between recognized words

Data model is important, but what matters more is how you access your data. Notice, there are very few (none, actually) sharded or otherwise distributed graph databases out there. If you compare insertion speed into a typical relational database and a graph database, your relational database will most likely win.
Yes, graph model is more versatile than relational model, but it doesn't make it universal - in some cases, this versatility is a roadblock for optimizations.
In fact, modern graph databases are a niche solutions for a narrow set of tasks - finding a route from A to B, working with friends in a social network, information technology in medicine.
For most business applications relational databases continue to prevail.

I'm missing the performance aspect in the answers above.
Performance of graph based data bases is inherently worse for scalar and maybe even tree based models. Only if you have a real graph, they may exhibit better performance.
Also most graph DBs do not feature ACID support such as almost any RDBMS.
From my real life experience I can tell almost any evolving data model will sooner or later become a graph and that's why graph DBs are superior in terms of flexibility and agility (they keep pace with the evolution of your data model).
That's why I don't think that RDBs will prevail for "For most business applications" as #Kostja says. I think they will prevail where ACID capability is essential.

Related

Graph database vs. RDB with link/bridge tables

I work in the fraud/AML (anti-money laundering) field, and we are exploring using a graph database to unearth hidden connections and links. I've read a fair amount abut graph databases lately (mostly neo4j, but I think the concepts are similar across different products?), and from what I can tell, they seem to be well-suited to this domain. The issue is that I'm having a hard time getting buy-in from tech management, as they seem to think that we can do the same things with our existing data reporting model, which is in Hadoop, and is essentially a data warehouse which has specific tables that provide many-to-many link tables between the core tables (I believe Kimball calls them 'bridge' tables?).
In a way, they seem to provide the same functionality as the relationship tables in a graph DB. Given that we have already constructed the link tablesin Hadoop, would a graph database provide any performance advantage for the kinds of things we may want to do (e.g. How is Customer A connected to Customer B), or have we largely negated any performance advantage of a graph DB by building all of the link tables?
On similar hardware platforms, a relational database will never be able to keep up with a well constructed graph database when performing "path-between" queries. Never.
Every graph database product has its own internal storage representation, but they are all fundamentally designed to store nodes and edges and support navigational queries across those nodes and edges. Without the addition of new graph-support features, relational database will struggle to provide graph-like capabilities.
The other advantage of using a native graph database is that the graph query languages are specifically designed to support path-between queries. In Objectivity/DB, a massively scalable and distributable object/graph database, we can use the DO query language to find all of the paths between two entities up to a specified number of degrees apart in milliseconds or seconds. A DO query might look like the following:
Match p = (:Account { accountId = "1234"})
-[*..100]->
(:Account { accountId = "5678"})
return p;
Here, we are saying: Find all paths (p) from Account 1234 to Account 5678, where they are between 1 and 100 degrees apart.
To create and execute this same query in a relational database would be much more complicated (without the addition of graph features to the database) and the execution of a query like this in a relational database would be much more resource intensive (memory, cpu, I/O).
If you have the opportunity to explore graph database for your project, make sure you understand your scalability and data distribution requirements. That information will be key to selecting the correct product.
Disclaimer: I am the Director of Field Operations for Objectivity.

Difference between RDBMS and ORDBMS

It happened to me when I was reading about PostgreSQL on its wiki page where it refers to itself as an ORDBMS. I always knew about Microsoft SQL Server which is an RDBM system. Can someone help me understand the main differences between Relational Database Management System(RDBMS) and Object Relational Database Management System (ORDBMS) and in what scenarios should I use one of them?
Also, my key question is related to the fact that in the world of Microsoft SQL Server we often use a layer of Entity Framework (EF) to do the object relational mapping on the application side. So, in ORDBMS world are all the responsibilities of an ORM already fulfilled by the database itself in entirety or there could be use cases or scenarios where I would end up using an ORM like Entity Framework on top of ORDBMS as well? Do people really even use ORMs on top of an ORDBMS system?
Taken from http://www.aspfree.com/c/a/database/introduction-to-rdbms-oodbms-and-ordbms/:
RDBMS
The main elements of RDBMS are based on Ted Codd’s 13 rules for a relational system, the concept of relational integrity, and normalization. The three fundamentals of a relational database are that all information must be held in the form of a table, where all data are described using data values. The second fundamental is that each value found in the table columns does not repeat. The final fundamental is the use of Standard Query Language (SQL).
Benefits of RDBMS are that the system is simple, flexible, and productive. Because the tables are simple, data is easier to understand and communicate with others. RDBMS are flexible because users do not have to use predefined keys to input information. Also, RDBMS are more productive because SQL is easier to learn. This allows users to spend more time inputting instead of learning. More importantly, RDBMS’s biggest advantage is the ease with which users can create and access data and extend it if needed. After the original database is created, new data categories can be added without the existing application being changed.
There are limitations to the relational database management system. First, relational databases do not have enough storage area to handle data such as images, digital and audio/video. The system was originally created to handle the integration of media, traditional fielded data, and templates. Another limitation of the relational database is its inadequacy to operate with languages outside of SQL. After its original development, languages such as C++ and JavaScript were formed. However, relational databases do not work efficiently with these languages. A third limitation is the requirement that information must be in tables where relationships between entities are defined by values.
ORDMS
Object-Relational database (ORDBMS) is the third type of database common today. ORDBMS are systems that “attempt to extend relational database systems with the functionality necessary to support a broader class of applications and, in many ways, provide a bridge between the relational and object-oriented paradigms.”
ORDBMS was created to handle new types of data such as audio, video, and image files that relational databases were not equipped to handle. In addition, its development was the result of increased usage of object-oriented programming languages, and a large mismatch between these and the DBMS software.
One advantage of ORDBMS is that it allows organizations to continue using their existing systems, without having to make major changes. A second advantage is that it allows users and programmers to start using object-oriented systems in parallel.
There are challenges in implementing an ORDBMS. The first is storage and access methods. The second is query processing, and the third is query optimization.
Most database players don't support it, or don't support it exclusively. It is complex, and not used broadly. Even if "data" is OO in nature, the databases existed decades ago, and they cannot take ORDBMS (or OODBMS). Learning curve also imposes problems.
ORDBMS/OODBMS are like virtual registry view you see in Registry Editor. Contents are tree-styled objects. But internally they might be stored as flat/hierarchical or in relational manner. You really don't care - the APIs provide you the view of registry information.
Similarly, even if major players don't support (and won't support) OO nature of database, they may provide some extensions. Or, you may have to craft your own framework for OO data. A movie database, having actors and directors can be represented using relations (tables). Actors, directors, shooting-locations would also be classes/objects, and can easily be represented using tables, and referential integrity imposed by database/DB designer.
You, as a developer would make this relational nature of data to an object-oriented form having Movie as a class, referencing actors/directors (1:1 or 1:N). I am not aware how/what EE facilitates this, but it would be doing mapping this way only.
Object-Relational Databases
Object oriented technology on top of relational technology and in the relational context.
Objects are stored in tables of objects rather than in tables of rows.
Support of major object oriented features: complex types, inheritance, aggregation, methods
Advantage: Extension of a well-known technology
Disadvantages: Mixture of both technologies may result in difficult to understand schemas
Has Performance problems
Object-relational systems include features such as complex object extensibility, encapsulation, inheritance, and better interfaces to OO languages.
ORDBMSs allow developers to embed new classes of data objects into the relational data model abstraction (and on top of SQL).
Following diagram shows how data can be accessed.

NoSQL / RDBMS hybrid with referential integrity (delete cascade)?

Is there a database out there that gives you the benefit of referential integrity and being able to use a SQL type language for querying, but also lets entities be loosely defined with respect to their data attributes and also the relationships between them?
E.g. take a RBAC type model where you have Permissions, Users, User Groups & Roles. A complex/flexible model could have the following rules:
Roles can have one or more permissions and a permission can belong to one or more Roles
Users can have one or more permissions and a permission can belong to one or more Users
Users Groups can have one or more permissions and a permission can belong to one or more Users Groups
Users can have one or more roles and a role can belong to one or more Users
User Groups can have one or more roles and a role can belong to one or more User Groups
Roles can have one or more roles and a role can belong to one or more Roles
To model the above in an RDBMS would involve the creation of lots of intersection tables. Ideally, all I'd like to define in the database is the entities themselves (User, Role, etc) plus some mandatory attributes. Everything else would then be dynamic (i.e. no DDL required), e.g. I could create a User with a new attribute which wasn't pre-defined. I could also create a relationship between entities that hasn't been predefined, though the database would handle referential integrity like a normal RDBMS.
The above can be achieved to some degree in a RDBMS by creating a table to store entities and another one to store relationships etc, but this overly complicates the SQL needed to perform simple queries and may also have performance implications.
Most NoSQL databases are built to scale very well. This is done at the cost of consistency, of which referential integrity is part of. So most NoSQL don't support any type of relational constraints.
There's one type of NoSQL database that does support relations. In fact, it's designed especially for relations: the graph database. Graph databases store nodes and explicit relations (edges) between these nodes. Both nodes and edges can contain data in the form of key/value pairs, without being tied to a predefined schema.
Graph databases are optimized for relational queries and nifty graph operations, such as finding the shortest path between two nodes, or finding all nodes within a given distance from the current node. You wouldn't need this in a role/permission scenario, but if you do, it'll be a lot harder to achieve using an RDBMS.
Another option is to make your entire data layer a hybrid, by using a RDBMS to store the relations and a document database to store the actual data. This would complicate your application slightly, but I don't think it's such a bad solution. You'll be using two different technologies, both dealing with the problems they were designed to deal with.
Given the requirements you specify in your question, a graph database is probably the sort of thing you are looking for, but there are other options. As #Niels van der Rest said, the two constraints of "no a priori schema" and "referential integrity" are very hard to reconcile. You might be able to find a Topic-Map based database that might do so, but I'm not familiar with specific implementations so I couldn't say for sure.
If you decide you really can't do without referential integrity, I fear you probably are stuck with an RDBMS. There are some tricks you can use that might avoid some of the problems you anticipate, I cover a couple in https://stackoverflow.com/questions/3395606..., which might give you some ideas. Still, for this sort of data-model requiring dynamic, post-priori schema, with meta-schema elements, an RDBMS is always going to be awkward.
If you are willing to forgo referential integrity, then you still have three approaches to consider.
Map/Reduce - in two flavours: distributed record-oriented (think, MongoDB), and column-oriented (think, Cassandra). Scales really really well, but you won't have your SQL-like syntax; joins suck; and matching your architecture to your specific query types is critical. In your case your focus on the entities and their attributes, rather than the relationships between the entities themselves, so I would probably consider a distributed record-oriented store; but only if I expected to need to scale beyond a single node—they do scale really really well.
Document-store - technically in two flavours, but one of them is a distributed record-oriented map/reduce datastore discussed above. The other is an inverted-index (think, Lucene/Solr). Do NOT disregard the power of an inverted-index; they can resolve obscenely complex record predicates amazingly fast. What they can't do is handle well is queries that include correlation or large relational joins. Still, you will be amazed at the incredible flexibility, sufficiently complex record predicates gives you.
Graph-store - come in a few flavours the first is the large-scale, ad-hoc key-value store (think, DBM/TokyoTyrant); the second is the tuple-space (think, Neo4j); the third is the RDF database (think, Sesame/Mulgara). I have a soft-spot for RDF, having helped develop mulgara, so I am not the most objective commenter. Still, if your scalability constraints will permit you to use an RDF-store, I find the inferencing permitted by RDF's denotational semantics (rare amongst noSQL datastore options) invaluable.
Some NoSQL solutions support security and SQL. One of these is OrientDB. The security system is (quite) well explained here.
Furthermore supports SQL.
There's the Gremlin language, supported by the Neo4j graph database. Regarding your example, have a look at Access control lists the graph database way and here. There's also a web-based tool including a REST API to Neo4j and a Gremlin console, see neo4j/webadmin.
You may want to check out MongoDB it is a document based database and so has a flexible schema. It is awesome and worth the time to see if it would suite your needs.

What is the difference between graph-based databases and object-oriented databases?

What is the difference between graph-based databases (http://neo4j.org/) and object-oriented databases (http://www.db4o.com/)?
I'd answer this differently: object and graph databases operate on two different levels of abstraction.
An object database's main data elements are objects, the way we know them from an object-oriented programming language.
A graph database's main data elements are nodes and edges.
An object database does not have the notion of a (bidirectional) edge between two things with automatic referential integrity etc. A graph database does not have the notion of a pointer that can be NULL. (Of course one can imagine hybrids.)
In terms of schema, an object database's schema is whatever the set of classes is in the application. A graph database's schema (whether implicit, by convention of what String labels mean, or explicit, by declaration as models as we do it in InfoGrid for example) is independent of the application. This makes it much simpler, for example, to write multiple applications against the same data using a graph database instead of an object database, because the schema is application-independent. On the other hand, using a graph database you can't simply take an arbitrary object and persist it.
Different tools for different jobs I would think.
Yes, the API seems like the major difference, but is not really a superficial one. Conceptually a set of objects will form a graph and you could think of an API that treats this graph in a uniform way. Conversely, you could in theory mine a generic graph structure for patterns and map them to objects exposed via some API. But the design of the API of an actual product will generally have consequence on how data is actually stored, how it can be queried, so it would be far from trivial to, say, create a wrapper and make it look like something else. Also, an object-oriented database must offer some integrity guarantees and a typing structure that a graph database won't normally do. In fact, serious OO database are far from "free form" :)
Take a look at [HyperGraphDB][1] - it is both a full object-oriented database (like db4o) and a very advanced graph database both in terms of representational and querying capabilities. It is capable of storing generalized hypergraphs (where edges can point to more than one node and also to other edges as well), it has a fully extensible type system embedded as a graph etc.
Unlike other graph databases, in HyperGraphDB every object becomes a node or an edge in the graph, with none-to-minimal API intrusion and you have the choice of representing your objects as a graph or treating them in a way that is orthogonal to the graph structure (as "payload" values of your nodes or edges). You can do sophisticated traversals, customized indexing and querying.
An explanation why HyperGraphDB is in fact an ODMS, see the blog post Is HyperGraphDB an OO Database? at Kobrix's website.
As Will descibes from another angle, a graphdb will keep your data separated from your application classes and objects. A graphdb also has more built-in functionality to deal with graphs, obviously - like shortest path or deep traversals.
Another important difference is that in a graphdb like neo4j you can traverse the graph based on relationship (edge) types and directions without loading the full nodes (including node properties/attributes). There's also the choice of using neo4j as backend of an object db, still being able to use all the graphy stuff, see: jo4neo This project has a different approach that could also count as an object db on top of neo4j: neo4j.rb. A new option is to use Spring Data Graph, which gives graphdb support through annotations.
The same question was asked in the comments to this blogpost.
From a quick browse of both their websites:
The major difference is the way the APIs are structured, rather than the kind of free-form database you can build with them.
db4o uses an object mapping - you create a Java/C# class, and it uses reflection to persist it in the database.
neo4j has an explicit manipulation API.
Neo4j seemed, in my humble opinion, much nicer to interact with.
You might also consider a key-value store - you could make exactly the same free-form database with one of those.
The difference at low-level is not so huge. Both manage relationships as direct links without costly joins. Furthermore both have a way to traverse relationships with the Query language, but the graph database has operators to go recursively at Nth level.
But the biggest difference is in the domain: in a Graph databases all is based on the 2 types: vertexes and edges, even if usually you can define your own types as a sort of subtypes of Vertex or Edge.
In the ODBMS you have no Vertex and Edge concepts, unless you write your own.
With graph databases, you have a slight semblance of a chance that it is based on mathematical graph theory. With Object-oriented databases, you have the certainty that it is based on nothing at all (and most certainly no mathematical theory at all).

In which cases am I better using DBs like Couch or Mongo as opposed to traditional RDBMS

In which cases am I better using DBs like Couch or Mongo as opposed to traditional RDBMS.
What is the problem they are suppose to solve (or solve more efficiently) then RDBMSs?
Relational databases traditionally store data in rows within tables. This isn't a natural fit for documents, which have a hierarchical structure.
Using Couch/Mongo (or similar) means you don't have to decompose hierarchical objects into a set of tables/rows. This decomposition can be painful to implement, and extensibility can be a problem when the shape of the object changes. Object-Relational mappers (ORMs) aim to solve this problem automatically.
I can't comment on the efficiency, I'm afraid. I suspect that's strongly implementation-specific (in terms of how you decompose your objects, what you're querying on etc.)
LDAP database is traditionally used in organizations to group employees under different categories. Because LDAP DB stores data in Tree structure, the data retrieval is super fast compared to the RDBMS. Atleast that is my experience in comparing LDAP DB over Relational ones.
To understand Relational versus NoSQL DB, you need to understand CAP theorem.
RDBMS gives importance to Consistence + Availability over Partition Tolerance
NoSQL DB gives importance to Partition Tolerance + Availability + EVENTUAL Consistency
In what I have seen and done it is the purpose that determines the type of DB to use.

Resources