Is there a difference between a graph and a hypergraph database?
Is every hypergraph database system also a graph database system?
I am asking for a side-by-side comparison. If it is possible to show this in one row:
Graph support: No/Graph/Hypergraph
Or if it is better to use two rows:
Graph support: No/Yes
Hypergraph suppport: No/Yes
Or means "graph" and "hypergraph" the same in the database context?
How a certain graph database handles its edges is an implementation detail. Hence an answer cannot really be given in regards to "[hyper]graph databases in general".
From the point of mathematical graph theory however there is a difference:
Edges as known from standard graphs model (directed or undirected) 1:1 connections.
Hyperedges as known from hypergraphs model (directed or undirected) n:n connections.
Graph vs. Hypergraph:
A simple graph can be considered a special case of the hypergraph, namely the 2-uniform hypergraph. However, when stated without any qualification, an edge is always assumed to consist of at most 2 vertices, and a graph is never confused with a hypergraph.
(Source)
Undirected hyperedges:
A[n] [undirected] hyperedge is an edge that is allowed to take on any number of vertices, possibly more than 2. A graph that allows any hyperedge is called a hypergraph.
(Source)
Directed hyperedges:
Directed hypergraphs (Ausiello et al., 1985; Gallo et al., 1993) are a generalization of directed graphs (digraphs) and they can model binary relations among subsets of a given set.
(Source)
Related
I am interested in storing a set of users that have personality scores.
I would like to get them to be more connected (closer?) to each other based on formulas that are applied to their scores. The more similar the users are, the more connected or closer to each other they are (like in a cluster). The closest nodes are to one-another, the more similar they are.
I currently do this over multiple steps (some in SQL and other in code) from a relational database.
Most posts out there and documentation seems to focus on how to get started and what the advantages are at a high level compared to relational databases.
I am wondering if Graph databases are better suited for this and would do most of the heavy lifting out of the box or more natively. Any details are greatly appreciated.
You could consider modeling it like this:
Where a vertex type/label named Score_range was introduced, together with the label User(with property score).
User vertices are connected to Score_range vertex like User with score: 101 is connected to Score_range(vertexID=100) which stands for [100, 110).
Thus, those vertices with closer score are more connected/clusterred in this graph, and in your applicaiton, you need to make connection changes when the score are recaculated/changed to the graph database.
Then, either to run cluster algorithm(i.e. Louvain) on the whole graph or graph query to find path between any two user nodes(i.e. FIND PATH in Nebula Graph, an opensource distributed graph database speaks opencypher), the closeness will be reflected.
But, I think due to this connection/closness is actually numerical/sortable, simply handling this closeness relationship may not need a graph database from the context you already provided.
PS. I drew a picture of a graph in the above schema:
I have to choose a graph database system and am very surprised that the mainstream ones don't support this feature ?
Why is it such a no-go for database systems ? And why developers out there don't seem to ask for it ? There should be a reason I'm not aware of.
Thanks for your help.
To my understanding, a "pure" bidirectional graph database cannot support cases where there are also unidirectional relationship, Twitter for example.
So the question becomes "why there are no hybrid (bidirectional and unidirectional) graph databases?" There are two problems with this solution:
It might not save storage as you expected because for bidirectional relationship, a hybrid graph database would need to store three edges instead of just one: A -> B, B -> A, and A <-> B. The reason is that some very common queries involve unidirectional relationship.
The cost of some basic queries is rather high. For example, there are two frequently asked questions in graph databases:
Find all friends of A
Find all friends of B
Commonly a graph database saves all friends of A as edges adjacent (AB, AC, AD, …). To find all of A's friends they just need to locate A and skim to the first edge whose prefix is not A. Suppose A has m friends and there are n. records in database in total, then the query complexity is O(log(n)) + O(m). The same logic applies to B. However, in case bidirectional edge is used, say A<->B, the cost of query for A's friends is the same but query for B's friends would be O(n) because a full database scan is required.
This is for a project that will map metadata. There are many more nodes but this particular one became a debate in the team.
Which model would yield the best query performance? Or it does not matter?
Option 1
Permission metadata is explicit as edges between nodes.
Option 2
Permission metadata is inside the properties of the edge.
Option 3
???
Let me comment for ArangoDB here, being one of its developers.
There is a third possibility, namely to have a single vertex collections and multiple edge collections for the different access methods. You would then "officially" have 3 graphs that share the same vertex set.
I would expect that this is better in performance, because each access type would only have to deal with a single type of edge and access would be fast.
Obviously it all depends on your queries. My statement holds for queries like "what are all the Entities a Person can update?" or "who can select this Entity?".
I could imagine that your standard query is more "Can this person delete that Entity?" or "Which access rights does this person have for that Entity?".
These two questions are probably not efficient with any of the approaches suggested, because as far as I see, all of them would then require a search, either in the outgoing edges of the Person or in the incoming edges of the Entity.
What would be needed here are a kind of "vertex centric indices", that is an index that can be used for the set of outgoing or incoming edges of a given vertex. If you, for example would use your option 2 (or indeed 1, this does not matter so much), and have a sorted index on all edges that is sorted first by Person and then by Entity. Then it is a lookup with time complexity O(log(#edges)) to find the (probably singleton) set of edges from a given Person to a given Entity.
We at ArangoDB are currently busy to add this feature, which will appear in one of the next two releases.
I can only speak for Neo4j here:
I don't know that it would matter much, but definitely benchmark! Both relationships and properties are stored as linked lists, so it will still need to traverse them. But if you have more relationships between Person and Entity nodes then putting them in properties starts to become more attractive.
I recommend checking out the free O'Reilly book Graph Databases to learn more about the internals of Neo4j. But benchmarks will always be the gold standard.
I want to store a graph of millions of nodes where each node links to another in an undirected manner (point A to B, automatically B points to A). I have examined Neo4j, OrientDB as possible solutions but they seem to be oriented in directed graphs, and Neo4j not being free for >1 million nodes is not a solution for me.
Can you help me which of the other NoSQL DBs (Redis, CouchDB, MongoDB, ...) would suit best for something like this and how could it be implemented? I want to make a no-property (just give me the linked elements) breadth-first queries with 2 depth levels (having A<->B, B<->C, C<->D, querying A should give me B and C, but not D).
OrientDB has no limitation on the number of nodes. Furthermore the default model is bi-directional. You can use it for FREE also for commercial purposes, since the applied license is Apache 2.
The GraphDB is documented here: http://code.google.com/p/orient/wiki/GraphDatabase. Basilary you can use the native API or the Blueprints implementation. Native APIs has an evolution of the SQL language with special operators for graphs. Example:
SELECT FROM Account WHERE friends TRAVERSE (1,7) (address.city.country.name = 'New Zealand')
That means give me all the accounts with such friend that lives in New Zealand. Friends are taken up to the 7th level of deep.
The second one allows to use the full Blueprint stack such as the Gremlin language to create your super-complex queries.
Neo4j always stores relationships/edges as directed, but when traversing/querying you can easily treat the graph as undirected by using Direction.BOTH or in some cases by not defining a direction at all. (This way there's no need for "double" edges to cover both directions, you simply ignore the direction - and there's no performance penalty when traversing edges "backwards".)
The 1 million "primitives" limit was removed for quite a while now. If your code is open source, you can use the community version for any size of the DB. For other cases there's the commercial versions which includes one free alternative.
There must be a standard data structure to hold, for instance, dog breeding information, plant genetic crossbreeding, and complex human relationships.
One might think it would be an easy tree structure, but the combination of two (or more, for genetic engineering) parents per offspring, multiple different offspring per parent set, multiple moves of parents (stud horses mate with many other horses), adoption, etc makes this a very fragmented structure.
I expect someone has tackled this before though. Any resources I should look into?
I think what you have is just a simple relational database, where the main relation is "child_of", "direct_descendant", etc.
Of course, the particular data structure here is acyclic, and you might want to do transitive queries (descendant of descendant of ...), which are not usually supported by standard SQL engines.
So if you want to do it in memory, you could us a directed acyclic graph (DAG).
Smells like a DAG. If the directed and acyclic is too limiting, you might want to look at the graph theory data-structures.
Using graphs for abstract problems, vertices represent entities and the edges represent the relationship.