Literature about graph databases talks about nodes, vertices and points. What is the difference between them?
There is no difference between the terms node, vertex and point. It is just a matter of the term that the author has decided to use.
Related
This is for a project that will map metadata. There are many more nodes but this particular one became a debate in the team.
Which model would yield the best query performance? Or it does not matter?
Option 1
Permission metadata is explicit as edges between nodes.
Option 2
Permission metadata is inside the properties of the edge.
Option 3
???
Let me comment for ArangoDB here, being one of its developers.
There is a third possibility, namely to have a single vertex collections and multiple edge collections for the different access methods. You would then "officially" have 3 graphs that share the same vertex set.
I would expect that this is better in performance, because each access type would only have to deal with a single type of edge and access would be fast.
Obviously it all depends on your queries. My statement holds for queries like "what are all the Entities a Person can update?" or "who can select this Entity?".
I could imagine that your standard query is more "Can this person delete that Entity?" or "Which access rights does this person have for that Entity?".
These two questions are probably not efficient with any of the approaches suggested, because as far as I see, all of them would then require a search, either in the outgoing edges of the Person or in the incoming edges of the Entity.
What would be needed here are a kind of "vertex centric indices", that is an index that can be used for the set of outgoing or incoming edges of a given vertex. If you, for example would use your option 2 (or indeed 1, this does not matter so much), and have a sorted index on all edges that is sorted first by Person and then by Entity. Then it is a lookup with time complexity O(log(#edges)) to find the (probably singleton) set of edges from a given Person to a given Entity.
We at ArangoDB are currently busy to add this feature, which will appear in one of the next two releases.
I can only speak for Neo4j here:
I don't know that it would matter much, but definitely benchmark! Both relationships and properties are stored as linked lists, so it will still need to traverse them. But if you have more relationships between Person and Entity nodes then putting them in properties starts to become more attractive.
I recommend checking out the free O'Reilly book Graph Databases to learn more about the internals of Neo4j. But benchmarks will always be the gold standard.
I have a graph computation that passes 'visited' Vertex IDs around, and I need to output information from those in the output phase. How do I look up a Vertex from its ID? I found Partition.getVertex(), but IIUC there is no guarantee that an arbitrary Vertex will be in a particular partition. Thanks in advance.
AFAIK you can’t look simply up all the vertices. That’s why you have the computation phase to store all the necessary information inside the nodes so you can afterwards print them.
Doing it differently would to my knowledge completely screw up Giraph’s programming paradigm.
Is there a difference between a graph and a hypergraph database?
Is every hypergraph database system also a graph database system?
I am asking for a side-by-side comparison. If it is possible to show this in one row:
Graph support: No/Graph/Hypergraph
Or if it is better to use two rows:
Graph support: No/Yes
Hypergraph suppport: No/Yes
Or means "graph" and "hypergraph" the same in the database context?
How a certain graph database handles its edges is an implementation detail. Hence an answer cannot really be given in regards to "[hyper]graph databases in general".
From the point of mathematical graph theory however there is a difference:
Edges as known from standard graphs model (directed or undirected) 1:1 connections.
Hyperedges as known from hypergraphs model (directed or undirected) n:n connections.
Graph vs. Hypergraph:
A simple graph can be considered a special case of the hypergraph, namely the 2-uniform hypergraph. However, when stated without any qualification, an edge is always assumed to consist of at most 2 vertices, and a graph is never confused with a hypergraph.
(Source)
Undirected hyperedges:
A[n] [undirected] hyperedge is an edge that is allowed to take on any number of vertices, possibly more than 2. A graph that allows any hyperedge is called a hypergraph.
(Source)
Directed hyperedges:
Directed hypergraphs (Ausiello et al., 1985; Gallo et al., 1993) are a generalization of directed graphs (digraphs) and they can model binary relations among subsets of a given set.
(Source)
Was looking for some online articles that might have a good idea or two in how to approach this but not finding much. Most of it deals with the various ways to generate and follow a path to a known destination type.
Basically the idea is that you have an existing node graph and want to utilize that data in order to locate entrances into the current area an AI unit would want to defend. So think of an AI unit assigned a task of defending a point with a radius and wanting to choose the best direction to face while waiting for enemies to show up that would be pathing to the defend point.
The critical point that I was looking for input on would be how one might identify the entrance points. Or the nodes that are the doorways into the area being defended.
I want to store a graph of millions of nodes where each node links to another in an undirected manner (point A to B, automatically B points to A). I have examined Neo4j, OrientDB as possible solutions but they seem to be oriented in directed graphs, and Neo4j not being free for >1 million nodes is not a solution for me.
Can you help me which of the other NoSQL DBs (Redis, CouchDB, MongoDB, ...) would suit best for something like this and how could it be implemented? I want to make a no-property (just give me the linked elements) breadth-first queries with 2 depth levels (having A<->B, B<->C, C<->D, querying A should give me B and C, but not D).
OrientDB has no limitation on the number of nodes. Furthermore the default model is bi-directional. You can use it for FREE also for commercial purposes, since the applied license is Apache 2.
The GraphDB is documented here: http://code.google.com/p/orient/wiki/GraphDatabase. Basilary you can use the native API or the Blueprints implementation. Native APIs has an evolution of the SQL language with special operators for graphs. Example:
SELECT FROM Account WHERE friends TRAVERSE (1,7) (address.city.country.name = 'New Zealand')
That means give me all the accounts with such friend that lives in New Zealand. Friends are taken up to the 7th level of deep.
The second one allows to use the full Blueprint stack such as the Gremlin language to create your super-complex queries.
Neo4j always stores relationships/edges as directed, but when traversing/querying you can easily treat the graph as undirected by using Direction.BOTH or in some cases by not defining a direction at all. (This way there's no need for "double" edges to cover both directions, you simply ignore the direction - and there's no performance penalty when traversing edges "backwards".)
The 1 million "primitives" limit was removed for quite a while now. If your code is open source, you can use the community version for any size of the DB. For other cases there's the commercial versions which includes one free alternative.