Gremlin query to get all the directly as well as indirectly related vertices

Gremlin query to get all the directly as well as indirectly related vertices - database

I have a graph database in which there is a root vertex with some "id= Xyz".This vertex is related to 3 more vertices with an edge having relationship of a "child".
Now these 3 vertices themselves have 2 connected vertices each with the same relationship as "child".
I want to get the info of all the directly or indirectly connected vertices keeping the nested structure maintained.
The JSON output should be in the nested form for indirect vertices.
Can we do this ?
And what to do if the depth of tree increase to n
Please help

Not sure how you want your data to look like, but you can do this in several ways:
Using path for a full tree:
g.V().hasLabel('root').emit().repeat(out()).path()
If you want only the two levels:
g.V().hasLabel('root').emit().repeat(out()).times(2).path()
You can also use project step if you want specific data structure:
g.V().hasLabel('root').project('v', 'c').
by(id).
by(out().project('v', 'c').by(id).
by(out().id().fold()).fold())
example: https://gremlify.com/at

Related

How to create a graph with sugraph with jgrapht

I want to create a graph with subgraphs (I think cluster is a synonymous - not sure ?). Someone could explain how to create this type of graph and how to export this composite graph. I readed this post Creating graph with clusters using jgrapht but i am not sure to understand this sentence "You could create a graph where each vertex is a graph by itself; edges between these vertices represent relations between these special vertices.". Does it mean that in this case verticles are relations between subgraphs. How to build this verticles? Thanks

In JGraphT, vertices and edges are arbitrary objects: a Graph<V,E> consists of vertices of type V, and edges of type E. See the wiki guide https://jgrapht.org/guide/UserOverview for details.
So you could simply define a graph where V is also a graph: Graph<Graph<String,DefaultEdge>,DefaultEdge>. In this example, each vertex in the graph, is a graph by itself, consisting of String vertices and DefaultEdge edges.

OrientDB SQL - how to traverse path based on weight?

Let's say we have a simple graph stored in OrientDB, with an edge type called weightedEdge that has a property weight. I'd like to be able to traverse from a starting node to an arbitrary depth, but only traverse to nodes that have the maximum value of weight on their incoming edge compared to all other edges at the same depth. Is this possible using OrientDB SQL?
So in the given example above, I would only want to hop along the red arrows.
Thank you!

May be you can use a let block and then property selection via [ ]:
traverse out()[weight = $maxW] ...
LET $maxW = SELECT max(weight) FROM ...

Create a consistent topology using pgrouting

I'm developing an application that needs routing information for certain cities. First, I downloaded a openstreetmap datafile (*.osm) and then I imported it into a postgreSQL database using osm2pgrouting tool (http://workshop.pgrouting.org/chapters/installation.html).
After this, I have the following tables:
nodes: that contains simple locations points
ways: that contains ways with some nodes involved
vertices_tmp: stores nodes that may be used for pgrouting functions like Djistra, A*...etc.
Would I use nodes that isn't in "vertices_tmp" table for calculate distances between nodes? Or I would only do it with the nodes stored in "vertices_tmp"?
Into ways table there are a field named "the_geom" that encapsulates different locations points (nodes). For example:
"MULTILINESTRING((1.5897786 42.5600441,1.5898376 42.5601455,1.589992 42.5605438,1.590095 42.5606795,1.5901782 42.5608026,1.5902238 42.561018,1.5902912 42.5616808,1.5903685 42.561899,1.5904008 42.5620563,1.5903836 42.5624117,1.5904265 42.5627151,1.5904947 42.5628368,1.5905981 42.5629553,1.5906926 42.5631007,1.590802 42.5633238,1.5908604 42.5634883,1.5909501 42.5637139,1.5910869 42.5638755,1.5913053 42.5639639,1.5914994 42.5640237,1.591648 42.5640261,1.5919232 42.5640145,1.5921124 42.5640363,1.5923292 42.5640953,1.592804 42.5643306))"
Can I route with intermediate nodes or only with source/target nodes?
My goal is to be able to routing between different nodes or POIs, depending of its amenity tags, not only driving distance, walking distance too. Furthermore I need to calculate shortest path for source/targets nodes.
Any idea for do this?

You can't use the elements of the nodes table.
If you want to plan route from one POI to another, first you have to find the nearest vertex/edge based on the selected algorithm(Shooting star requires edges, the others use vertices).
After this you can make the routing, just pick an algorithm from THIS SITE
You will find there a good tutorial about different routing solutions and some help for the detailed usage (including how to determine the closest way).

Need help solving a problem using graphs in C

i'm coding a c project for an algorithm class and i really need some help!
Here's the problem:
I have a set of names like this one N = (James,John,Robert,Mary,Patricia,Linda Barbara) wich are stored in an RB tree.
Starting from this set of names a series of couple like those ones are formed:
(James,Mary)
(James,Patricia)
(John,Linda)
(John,Barbara)
(Robert,Linda)
(Robert,Barbara)
Now i need to merge the elements in a way that i can form n subgroups with the constraint that each pairing is respected and the group has the smallest possible cardinality.
With the couples in the example they will form two groups
(James,Mary,Patricia) and (John,Robert,Barbara,Linda).
The task is to return the maximum number of groups formed and the number of males and females in the group with the maximum cardinality.
In this case it would be 2 2 2
I was thinking about building a graph where every name is represented by a vertex and two vertex are in an edge only if they are paired.
I can then use an algorithm (like Kruskal) to find the Minimum spanning tree.Is that right?
The problem is that the graph would not be completely connected.
I also need to find a way to map the names to the edges of the Graph and vice-versa.
Can the edges be indexed by a string?
Every help is really appreciated :)
Thanks in advice!

You don't need to find the minimum spanning tree. That is really for finding the "best" edges in a graph that will still keep the graph connected. In other words, you don't care how John and Robert are connected, just that they are.
You say that the problem is that the graph would not be completely connected, but I think that is actually the point. If you represent graph edges by using the couples as you suggest, then the vertices that are connected form the groups that you are looking for.
In your example, James is connected to Mary and also James is connected to Patricia. No other person connects to any of those three vertices (if they did, you would have another couple that included them), which is why they form a single group of (James, Mary, Patricia). Similarly all of John, Robert, Barbara, and Linda are connected to each other.
Your task is really to form the graph and find all of the connected subgraphs that are disjoint from each other.
While not a full algorithm, I hope that helps get you started.

I think that you can easily solve this with a dfs and connected components. Because every person(node) has a relation with an other one (edge). So you have an outer loop and run an explore function for every node which is unvisited and add the same number for every node explored by the explore function.
e.g
dfs() {
int group 0;
for(int i=0;i<num_nodes;i++) {
if(nodes[i].visited==false){
explore(nodes[i],group);
group++;
}
}
then you simple have to sort the node by the group and then you are ready. if you want to track the path you can use a pre number which indicates which node was explored first, second..etc
(sorry for my bad english)!

The sets of names and pairs of names already form a graph. A data structure with nodes and pointers to other nodes is just another representation, one that you don't necessarily need. Disjoint sets are easier to implement IMO, and their purpose in life is exactly to keep track of sameness as pairs of things are joined together.

Datamodel for Property Graph over HBase/Cassandra

I'm willing to store a Property Graph into HBase. A Property Graph is a graph nodes and edges have properties and multiple edges can link the same tuple of nodes as long as the edges belong to different types.
My query pattern will be either asking for properties and neighborhood or traversing the graph. An example is: Vertex[name=claudio]=>OutgoingEdge[knows]=>Vertex[gender=female], which will give me all the female people that claudio likes.
I know that a graph database does just this, but they usually don't scale on multiple nodes in case of a huge dataset. So I'm willing to implement this on a NoSQL ColumnStore (HBase, Cassandra...)
My datamodel follows.
Vertices Table:
key: vertexid (uuid)
Family "Properties:": <property name>=><property value>, ...
Family "OutgoingEdges:": <edge key>=><other vertexid>, ...
Family "IncomingEdges:": same as outgoing edges...
This table allows me to fetch quickly the properties of a vertex and
its adjacency list. I can't use the vertexid as the other endpoint
because multiple edges (with different types) can connect the same two
vertices.
Edges Table:
key: edge key (composite(<source vertexid>, <destination vertexid>,
<edge typename>)) (i.e. vertexid1_vertexid2_knows)
Family "Properties:": <property name>=><property value>, ...
This table allows me to fetch quickly the properties of an edge.
Edges Types:
key: composite(<source vertexid>, "out|in", <edge typename>) (i.e.
vertexid1_out_knows)
Family "Neighbor:": <destination vertexid>=>null,...
This table allows me to search/scan for edges that are either incoming
or outgoing from a vertex and belong to specific type and would be the
core of the traversing ability of the API (so i want it to be as fast as
possible both in terms of network I/O (RPCs), disk I/O (seek)). It
should also "scale" on the size of the graph, meaning that with the
growth of the graph the cost of this type of operation should depend on
the number of edges outgoing from the vertex and not on the total number
of vertices and edges.
The example above i'd be considering vertexid1 the source vertex with
property name:claudio i'd scan vertexid1_out_knows and receive the list of
vertices connected. After that i can scan on the column
"Properties:gender" on these vertices and look for those having the
"female" value.
Questions:
1) General: do you see a better data model for my operations?
2) Can i fit everything in one table where for certain keys some
families would be empty (i.e. the "OutgoingEdges:" family would not make
sense for the edges)? I'd like that because as you can see all the keys
are composed by the vertexid uuid prefix, so they would be very compact
and fit mostly on the same regionserver.
3) I guess that for the scanning I'd make extensive use of Filters. I
guess regexp Filter will be my friend. Do you have concerns about
performance of filters applied to this data model?

This type of model looks like a sensible starting point for Cassandra (don't know much about HBase) - but for any distributed store you will run up against problems when traversing, because traversals will cross multiple nodes.
This is why dedicated graph databases such as Neo4J use a single-node design, and try to keep all data in RAM.
Looking up properties of particular nodes or edges should work well and scale horizontally - Twitter's FlockDB (now apparently abandoned) was a notable example of this.
You also need to consider whether you need lookups other than by ID (i.e. do you need any indexes)?