How to remove all nodes & edges on AgensGraph? - agens-graph

After testing several cases on AgensGraph.
I try to new test. but undesired result returns.
How can I remove all old data.

"DETACH DELETE" can remote all nodes and related edge.
# create (:v{id:1})-[:e]->(:v{id:2});
GRAPH WRITE (INSERT VERTEX 2, INSERT EDGE 1)
# match (n) detach delete n;
GRAPH WRITE (DELETE VERTEX 2, DELETE EDGE 1)
When absence of "DETACH" token, node with edges causes error.

Related

Delete duplicate nodes between two nodes in Neo4j

Due to unwanted scrip execution my database has some duplicate nodes and it looks like this
From the image, there are multiple nodes with 'see' and 'execute' from p1 to m1.
I tried to eliminate them using this:
MATCH (ur:Role)-[c:CAN]->(e:Entitlement{action:'see'})-[o:ON]->(s:Role {id:'msci'})
WITH collect(e) AS rels WHERE size(rels) > 1
FOREACH (n IN TAIL(rels) | DETACH DELETE n)
Resulting in this:
As you can see here, it deletes all the nodes with 'see' action.
I think I am missing something in the query which I am not sure of.
The good graph should be like this:
EDIT: Added one more scenario with extra relations
this works if there is more than one extra :) and cleanses your entire graph.
// get all the patterns where you have > 1 entitlement of the same "action" (see or execute)
MATCH (n:Role)-->(e:Entitlement)-->(m:Role)
WITH n,m,e.action AS EntitlementAction,
COLLECT(e) AS Entitlements
WHERE size(Entitlements) > 1
// delete all entitlements, except the first one
FOREACH (e IN tail(Entitlements) |
DETACH DELETE e
)
Your query pretty explicitly is matching the "See" action.
MATCH (ur:Role)-[c:CAN]->(e:Entitlement{action:'see'})
You might try the query without specifying the action type.
Edit:
I went back and played with this and this worked for me:
MATCH (ur:Role)-[c:CAN]->(e:Entitlement {action: "see"})-[o:ON]->(s:Role {id:'msci'})
with collect(e) as rels where size(rels) > 1
with tail(rels) as tail
match(n:Entitlement {id: tail[0].id})
detach delete n
You'd need to run two queries one for each action but as long as it's only got the one extra relationship it should work.

Cypher: How to merge two nodes' keys in a third node

in cypher i want to create a node that has attributes of two existing nodes. I know that to "copy" a node the query is: MATCH (old:Type) CREATE (new:Type) SET new=old but this allow me to have the keys from just one node. I need a way to join the keys of two nodes and copy that node
match (a:Ubuntu1604), (h:Host) CREATE (b:Ubuntu1604) SET b=a,b=h return b
will obviously create b equals to h. I need and "append" function
I figured out myself.
Solution is simply
match (a:Ubuntu1604), (h:Host) CREATE (b:Ubuntu1604) SET b=a, b+=h return b

Gremlin - Move multiple edges in single traversal

I am using Gremlin to access data in AWS Neptune. I need to modify 2 edges going out from a single vertex to point to 2 vertices which are different from the ones it points to at the moment.
For instance if the current relation is as shown below:
(X)---(A)---(Y)
(B) (C)
I want it to modified to:
(X) (A) (Y)
/ \
(B) (C)
To ensure the whole operation is done in a single transaction, I need this done in a single traversal (because manual transaction logic using tx.commit() and tx.rollback() is not supported in AWS Neptune).
I tried the following queries to get this done but failed:
1) Add the new edges and drop the previous ones by selecting them using alias:
g.V(<id of B>).as('B').V(<id of C>).as('C').V(<id of A>).as('A').outE('LINK1','LINK2')
.as('oldEdges').addE('LINK1').from('A').to('B').addE('LINK2').from('A').to('C')
.select('oldEdges').drop();
Here, since outE('LINK1','LINK2') returns 2 edges, the edges being added after it, executes twice. So I get double the number of expected edges between A to B and C.
2) Add the new edges and drop the existing edges where edge id not equal to newly added ones.
g.V(<id of B>).as('B').V(<id of C>).as('C').V(<id of A>).as('A')
.addE('LINK1').from('A').to('B').as('edge1').addE('LINK2').from('A').to('C').as('edge2')
.select('A').outE().where(__.hasId(neq(select('edge1').id()))
.and(hasId(neq(select('edge2').id())))).drop();
Here I get the following exception in my gremlin console:
could not be serialized by org.apache.tinkerpop.gremlin.driver.ser.AbstractGryoMessageSerializerV3d0.
java.lang.IllegalArgumentException: Class is not registered: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal
Note: To register this class use: kryo.register(org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.class);
at org.apache.tinkerpop.shaded.kryo.Kryo.getRegistration(Kryo.java:488)
at org.apache.tinkerpop.gremlin.structure.io.gryo.AbstractGryoClassResolver.writeClass(AbstractGryoClassResolver.java:110)
at org.apache.tinkerpop.shaded.kryo.Kryo.writeClass(Kryo.java:517)
at org.apache.tinkerpop.shaded.kryo.Kryo.writeClassAndObject(Kryo.java:622)
at org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.shaded.ShadedKryoAdapter.writeClassAndObject(ShadedKryoAdapter.java:49)
at org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.shaded.ShadedKryoAdapter.writeClassAndObject(ShadedKryoAdapter.java:24)
...
Please help.
You can try:
g.V(<id of A>).union(
addE('Link1').to(V(<id of B>)),
addE('Link2').to(V(<id of C>)),
outE('Link1', 'Link2').where(inV().hasId(<id of X>,<id of Y>)).drop()
)

Collecting properties and certain type of out nodes during a gremlin graph traversal

I have a simple graph of the type:
(3) -> (2) -> (1)
| | | |
(D) |(B) (A)
(C)
In this case, I want to traverse, from (3) to (1). Think of this as a tree where node (1) is the root and node (3) is a leaf and the edges between (3) to (2) and (2) to (1) represent parent relationship (2 is parent of 3, etc.).
At the same time, each vertex that represents tree node (1, 2, 3) has one or more edges that represent permissions relationships, so in this case, (A) represents a user and the relationship between (1) to (A) represents a permission (access) the user (A) has over (1). In the case of (2), there are two users that have access to this vertex (B and C).
You can imagine that nodes with numbers are "folders" and have certain attributes (i.e. name) and nodes with letters are users or permissions and have certain attributes too (i.e name, type of access, etc).
I can successfully traverse the graph from 3 to 1, printing properties of each of the nodes in the path (see example below).
So for example if I do:
gremlin> g.V('3').repeat(out()).until(has(id, '1')).path().by('name')
==>[folder1.1, folder1, root]
or
gremlin> g.V('3').repeat(out()).until(has(id, '1')).path().by('name')
==>[folder1.1, folder1, root]
gremlin> g.V('3').repeat(out('parent').simplePath().store('x')).until(has(id, '1')).cap('x')
==>{v[2]=1, v[1]=1}
// Although at this point I missed listing v[3]
The problem is that I need to traverse the tree and collect the permissions in the way, for example (and format may vary) I'm looking for a way to get something in the lines of:
gremlin> ...
==>[ { { 3, name="foo" } , [ { D, permission="x" } ] }, { 2, [ {A, permission="y" }, {B, permission="z"} ] }, { { 3, name="foo" } , [ { D, permission="x" } ] } ]
Basically I want to traverse the path from (3) to (1) (there may be more than just three nodes), collecting attributes of the vertex in the path, and collecting vertex related to certain out edges (just one level, I don't want to expand beyond a single edge in depth for permissions) together with their attributes.
Note that I'm very new to gremlin, and I've been trying and learning in the process for a couple of days without even knowing if it is possible...
Any idea, suggestion would be appreciated, thanks!
When you do this:
gremlin> g.V('3').repeat(out()).until(has(id, '1')).path().by('name')
==>[folder1.1, folder1, root]
recognize that the by() modulator to path() is simply a transformation. The modulator is being applied in a round-robin fashion to each item in the path (since you only supplied one by()). That transformation says, "take the graph element (i.e. Vertex) in the path and extract the 'name' property value and return that rather than the actual vertex."
Given that knowledge you really just need a different form of transformation to get the data that you want from the path. Note that by() has an overload that takes a Traversal as an argument which means you can apply a Traversal to each graph element in the path. I would think that project() would be a good step to use to produce the output you want - something like:
g.V('3').repeat(out()).until(has(id, '1')).
path().
by(project('name','permissions').
by('name').
by(out('permissions').fold()))
Note that the above is saying, "for each vertex in the path, project it to a Map with keys 'name' and 'permissions' where 'name' is the 'name' value from the vertex in the path and 'permissions' is a list of related permission vertices."

What DBMS should I use to store openstreetmap as a graph?

Background:
I need to store the following data in a database:
osm nodes with tags;
osm edges with weights (that is an edge between two nodes extracted from 'way' from an .osm file).
Nodes that form edges, which are in the same 'way' sets should have the same tags as those ways, i.e. every node in a 'way' set of nodes which is a highway should have a 'highway' tag.
I need this structure to easily generate a graph based on various filters, e.g. a graph consisting only of nodes and edges which are highways, or a 'foot paths' graph, etc.
Problem:
I have not heard about the spatial index before, so I just parsed an .osm file into a MySQL database:
all nodes to a 'nodes' table (with respective coordinates columns) - OK, about 9,000,000 of rows in my case:
(INSERT INTO nodes VALUES [pseudocode]node_id,lat,lon[/pseudocode];
all ways to an 'edges' table (usually one way creates a few edges) - OK, about 9,000,000 of rows as well:
(INSERT INTO edges VALUES [pseudocode]edge_id,from_node_id,to_node_id[/pseudocode];
add tags to nodes, calculate weights for edges - Problem:
Here is the problematic php script:
$query = mysql_query('SELECT * FROM edges');
$i=0;
while ($res = mysql_fetch_object($query)) {
$i++;
echo "$i\n";
$node1 = mysql_query('SELECT * FROM nodes WHERE id='.$res->from);
$node1 = mysql_fetch_object($node1);
$tag1 = $node1->tags;
$node2 = mysql_query('SELECT * FROM nodes WHERE id='.$res->to);
$node2 = mysql_fetch_object($node2);
$tag2 = $node2->tags;
mysql_query('UPDATE nodes SET tags="'.$tag1.$res->tags.'" WHERE nodes.id='.$res->from);
mysql_query('UPDATE nodes SET tags="'.$tag2.$res->tags.'" WHERE nodes.id='.$res->to);`
Nohup shows the output for 'echo "$i\n"' each 55-60 seconds (which can take more than 17 years to finish if the size of the 'edges' table is more than 9,000,000 rows, like in my case).
Htop shows a /usr/bin/mysqld process which takes 40-60% of CPU.
The same problem exists for the script which tries to calculate the weight (the distance) of an edge (select all edges, take an edge, then select two nodes of this edge from 'nodes' table, then calculate the distance, then update the edges table).
Question:
How can I make this SQL updates faster? Should I tweak any of MySQL config settings? Or should I use PostgreSQL with PostGIS extension? Should I use another structure for my data? Or should I somehow utilize the spatial index?
If I understand you right there is two things to discuss.
First, your idea of putting the highway-tag on the start and stop nodes. A node can have more than one edge connected, where to put the tag from the second edge? Or third or fourth if it is a crossing? The reason the highway tag is putted in the edges table in the first place is that from a relational point of view that is where it belongs.
Second, to get the whole table and process it outside the database is not the right way. What a relational database is really good at is taking care of this whole process.
I have not worked with mysql, and I fully agree that you will probably get a lot more fun if migrating to PostGIS since PostGIS has a lot better spatial capabilities (even if you don't need any spatial capabilities for this particular task) from what I have heard.
So if we ignore the first problem and just for showing the concept say that there is only two edges connected to one node and that each node has two tag-fields. tag1 and tag2. Then it could look something like this in PostGIS:
UPDATE nodes set tag1=edges.tags from edges where nodes.id=edges.from;
UPDATE nodes set tag2=edges.tags from edges where nodes.id=edges.to;
If you disable the indexes that should be very fast.
Again,
if I have understood you right.
PostgreSQL
Openstreetmap itself uses PostgreSQL, so I guess that's recommended.
See: http://wiki.openstreetmap.org/wiki/PostgreSQL
You can see OSM's database schema at: http://wiki.openstreetmap.org/wiki/Database_Schema
So you can use the same fields, fieldtypes and indexes that OSM uses for maximum compatibility.
MySQL
If you want to import .osm files into a MySQL database, have a look at:
http://wiki.openstreetmap.org/wiki/OsmDB.pm
Here you will find perl code that will create MySQL tables, parse a OSM file and import it into your MySQL database.
Making it faster
If you are updating in bulk, you don't need to update the indexes after every update.
You can just disable the indexes, do all your updates and re-enable the index.
I'm guessing that should be a whole lot faster.
Good luck

Resources