Delete duplicate nodes between two nodes in Neo4j - database

Due to unwanted scrip execution my database has some duplicate nodes and it looks like this
From the image, there are multiple nodes with 'see' and 'execute' from p1 to m1.
I tried to eliminate them using this:
MATCH (ur:Role)-[c:CAN]->(e:Entitlement{action:'see'})-[o:ON]->(s:Role {id:'msci'})
WITH collect(e) AS rels WHERE size(rels) > 1
FOREACH (n IN TAIL(rels) | DETACH DELETE n)
Resulting in this:
As you can see here, it deletes all the nodes with 'see' action.
I think I am missing something in the query which I am not sure of.
The good graph should be like this:
EDIT: Added one more scenario with extra relations

this works if there is more than one extra :) and cleanses your entire graph.
// get all the patterns where you have > 1 entitlement of the same "action" (see or execute)
MATCH (n:Role)-->(e:Entitlement)-->(m:Role)
WITH n,m,e.action AS EntitlementAction,
COLLECT(e) AS Entitlements
WHERE size(Entitlements) > 1
// delete all entitlements, except the first one
FOREACH (e IN tail(Entitlements) |
DETACH DELETE e
)

Your query pretty explicitly is matching the "See" action.
MATCH (ur:Role)-[c:CAN]->(e:Entitlement{action:'see'})
You might try the query without specifying the action type.
Edit:
I went back and played with this and this worked for me:
MATCH (ur:Role)-[c:CAN]->(e:Entitlement {action: "see"})-[o:ON]->(s:Role {id:'msci'})
with collect(e) as rels where size(rels) > 1
with tail(rels) as tail
match(n:Entitlement {id: tail[0].id})
detach delete n
You'd need to run two queries one for each action but as long as it's only got the one extra relationship it should work.

Related

Laravel skip and delete records from Database

I'm developing an app which needs to record a list of a users recent video uploads. Importantly it needs to only remember the last two videos associated with the user so I'm trying to find a way to just keep the last two records in a database.
What I've got so far is the below, which creates a new record correctly, however I then want to delete all records that are older than the previous 2, so I've got the below.
The problem is that this seems to delete ALL records even though, by my understanding, the skip should miss out the two most recent records,
private function saveVideoToUserProfile($userId, $thumb ...)
{
RecentVideos::create([
'user_id'=>$userId,
'thumbnail'=>$thumb,
...
]);
RecentVideos::select('id')->where('user_id', $userId)->orderBy('created_at')->skip(2)->delete();
}
Can anyone see what I'm doing wrong?
Limit and offset do not work with delete, so you can do something like this:
$ids = RecentVideos::select('id')->where('user_id', $userId)->orderByDesc('created_at')->skip(2)->take(10000)->pluck('id');
RecentVideos::whereIn('id', $ids)->delete();
First off, skip() does not skip the x number of recent records, but rather the x number of records from the beginning of the result set. So in order to get your desired result, you need to sort the data in the correct order. orderBy() defaults to ordering ascending, but it accepts a second direction argument. Try orderBy('created_at', 'DESC'). (See the docs on orderBy().)
This is how I would recommend writing the query.
RecentVideos::where('user_id', $userId)->orderBy('created_at', 'DESC')->skip(2)->delete();

Gremlin - Move multiple edges in single traversal

I am using Gremlin to access data in AWS Neptune. I need to modify 2 edges going out from a single vertex to point to 2 vertices which are different from the ones it points to at the moment.
For instance if the current relation is as shown below:
(X)---(A)---(Y)
(B) (C)
I want it to modified to:
(X) (A) (Y)
/ \
(B) (C)
To ensure the whole operation is done in a single transaction, I need this done in a single traversal (because manual transaction logic using tx.commit() and tx.rollback() is not supported in AWS Neptune).
I tried the following queries to get this done but failed:
1) Add the new edges and drop the previous ones by selecting them using alias:
g.V(<id of B>).as('B').V(<id of C>).as('C').V(<id of A>).as('A').outE('LINK1','LINK2')
.as('oldEdges').addE('LINK1').from('A').to('B').addE('LINK2').from('A').to('C')
.select('oldEdges').drop();
Here, since outE('LINK1','LINK2') returns 2 edges, the edges being added after it, executes twice. So I get double the number of expected edges between A to B and C.
2) Add the new edges and drop the existing edges where edge id not equal to newly added ones.
g.V(<id of B>).as('B').V(<id of C>).as('C').V(<id of A>).as('A')
.addE('LINK1').from('A').to('B').as('edge1').addE('LINK2').from('A').to('C').as('edge2')
.select('A').outE().where(__.hasId(neq(select('edge1').id()))
.and(hasId(neq(select('edge2').id())))).drop();
Here I get the following exception in my gremlin console:
could not be serialized by org.apache.tinkerpop.gremlin.driver.ser.AbstractGryoMessageSerializerV3d0.
java.lang.IllegalArgumentException: Class is not registered: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal
Note: To register this class use: kryo.register(org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.class);
at org.apache.tinkerpop.shaded.kryo.Kryo.getRegistration(Kryo.java:488)
at org.apache.tinkerpop.gremlin.structure.io.gryo.AbstractGryoClassResolver.writeClass(AbstractGryoClassResolver.java:110)
at org.apache.tinkerpop.shaded.kryo.Kryo.writeClass(Kryo.java:517)
at org.apache.tinkerpop.shaded.kryo.Kryo.writeClassAndObject(Kryo.java:622)
at org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.shaded.ShadedKryoAdapter.writeClassAndObject(ShadedKryoAdapter.java:49)
at org.apache.tinkerpop.gremlin.structure.io.gryo.kryoshim.shaded.ShadedKryoAdapter.writeClassAndObject(ShadedKryoAdapter.java:24)
...
Please help.
You can try:
g.V(<id of A>).union(
addE('Link1').to(V(<id of B>)),
addE('Link2').to(V(<id of C>)),
outE('Link1', 'Link2').where(inV().hasId(<id of X>,<id of Y>)).drop()
)

Methods to avoiding cross-product APOC query (using hashmap?)?

I currently have a Neo4J database with a simple data structure comprised of about 400 million (:Node {id:String, refs:List[String]}), with two properties: An id, which is a string, and refs, which is a list of strings.
I need to search all of these nodes to identify relationships between them. These directed relationships exist if a node's id is in the ref list of another nose. A simple query that accomplishes what I want (but is too slow):
MATCH (a:Node), (b:Node)
WHERE ID(a) < ID(b) AND a.id IN b.refs
CREATE (b)-[:CITES]->(a)
I can use apoc.periodic.iterate, but the query is still much too slow:
CALL apoc.periodic.iterate(
"MATCH (a:Node), (b:Node)
WHERE ID(a) < ID(b)
AND a.id IN b.refs RETURN a, b",
"CREATE (b)-[:CITES]->(a)",
{batchSize:10000, parallel:false,iterateList:true})
Any suggestions as to how I can build this database and relationships efficiently? I've vague thoughts about creating a hash table as I first add the Nodes to the database, but am not sure how to implement this, especially in Neo4j.
Thank you.
If you first create an index on :Node(id), like this:
CREATE INDEX ON :Node(id);
then this query should be able to take advantage of the index to quickly find each a node:
MATCH (b:Node)
UNWIND b.refs AS ref
MATCH (a:Node)
WHERE a.id = ref
CREATE (b)-[:CITES]->(a);
Currently, the Cypher execution planner does not support using the index when directly comparing the values of 2 properties. In the above query, the WHERE clause is comparing a property with a variable, so the index can be used.
The ID(a) < ID(b) test was omitted, since your question did not state that ordering the native node IDs in such a way was required.
[UPDATE 1]
If you want to run the creation step in parallel, try this usage of the APOC procedure apoc.periodic.iterate:
CALL apoc.periodic.iterate(
"MATCH (b:Node) UNWIND b.refs AS ref RETURN b, ref",
"MATCH (a:Node {id: ref}) CREATE (b)-[:CITES]->(a)",
{batchSize:10000, parallel:true})
The first Cypher statement passed to the procedure just returns each b/ref pair. The second statement (which is run in parallel) uses the index to find the a node and creates the relationship. This division of effort puts the more expensive processing in the statement running in a parallel thread. The iterateList: true option is omitted, since we (probably) want the second statement to run in parallel for each b/ref pair.
[UPDATE 2]
You can encounter deadlock errors if parallel executions try to add relationships to the same nodes (since each parallel transaction will attempt to write-lock every new relationship's end nodes). To avoid deadlocks involving just the b nodes, you can do something like this to ensure that a b node is not processed in parallel:
CALL apoc.periodic.iterate(
"MATCH (b:Node) RETURN b",
"UNWIND b.refs AS ref MATCH (a:Node {id: ref}) CREATE (b)-[:CITES]->(a)",
{batchSize:10000, parallel:true})
However, this approach is still vulnerable to deadlocks if parallel executions can try to write-lock the same a nodes (or if any b nodes can also be used as a nodes). But at least hopefully this addendum will help you to understand the problem.
[UPDATE 3]
Since these deadlocks are race conditions that depend on multiple parallel executions trying to lock the same nodes at the same time, you might be able to work around this issue by retrying the "inner statement" whenever it fails. And you could also try making the batch size smaller, to reduce the probability that multiple parallel retries will overlap in time. Something like this:
CALL apoc.periodic.iterate(
"MATCH (b:Node) RETURN b",
"UNWIND b.refs AS ref MATCH (a:Node {id: ref}) CREATE (b)-[:CITES]->(a)",
{batchSize: 1000, parallel: true, retries: 100})

How to query for multiple vertices and counts of their relationships in Gremlin/Tinkerpop 3?

I am using Gremlin/Tinkerpop 3 to query a graph stored in TitanDB.
The graph contains user vertices with properties, for example, "description", and edges denoting relationships between users.
I want to use Gremlin to obtain 1) users by properties and 2) the number of relationships (in this case of any kind) to some other user (e.g., with id = 123). To realize this, I make use of the match operation in Gremlin 3 like so:
g.V().match('user',__.as('user').has('description',new P(CONTAINS,'developer')),
__.as('user').out().hasId(123).values('name').groupCount('a').cap('a').as('relationships'))
.select()
This query works fine, unless there are multiple user vertices returned, for example, because multiple users have the word "developer" in their description. In this case, the count in relationships is the sum of all relationships between all returned users and the user with id 123, and not, as desired, the individual count for every returned user.
Am I doing something wrong or is this maybe an error?
PS: This question is related to one I posted some time ago about a similar query in Tinkerpop 2, where I had another issue: How to select optional graph structures with Gremlin?
Here's the sample data I used:
graph = TinkerGraph.open()
g = graph.traversal()
v123=graph.addVertex(id,123,"description","developer","name","bob")
v124=graph.addVertex(id,124,"description","developer","name","bill")
v125=graph.addVertex(id,125,"description","developer","name","brandy")
v126=graph.addVertex(id,126,"description","developer","name","beatrice")
v124.addEdge('follows',v125)
v124.addEdge('follows',v123)
v124.addEdge('likes',v126)
v125.addEdge('follows',v123)
v125.addEdge('likes',v123)
v126.addEdge('follows',v123)
v126.addEdge('follows',v124)
My first thought, was: "Do we really need match step"? Secondarily, of course, I wanted to write this in TP3 fashion and not use a lambda/closure. I tried all manner of things in the first iteration and the closest I got was stuff like this from Daniel Kuppitz:
gremlin> g.V().as('user').local(out().hasId(123).values('name')
.groupCount()).as('relationships').select()
==>[relationships:[:]]
==>[relationships:[bob:1]]
==>[relationships:[bob:2]]
==>[relationships:[bob:1]]
so here we used local step to restrict the traversal within local to the current element. This works, but we lost the "user" tag in the select. Why? groupCount is a ReducingBarrierStep and paths are lost after those steps.
Well, let's go back to match. I figured I could try to make the match step traverse using local:
gremlin> g.V().match('user',__.as('user').has('description','developer'),
gremlin> __.as('user').local(out().hasId(123).values('name').groupCount()).as('relationships')).select()
==>[relationships:[:], user:v[123]]
==>[relationships:[bob:1], user:v[124]]
==>[relationships:[bob:2], user:v[125]]
==>[relationships:[bob:1], user:v[126]]
Ok - success - that's what we wanted: no lambdas and local counts. But, it still left me feeling like: "Do we really need match step"? That's when Mr. Kuppitz closed in on the final answer which makes copious use of the by step:
gremlin> g.V().has('description','developer').as("user","relationships").select().by()
.by(out().hasId(123).values("name").groupCount())
==>[user:v[123], relationships:[:]]
==>[user:v[124], relationships:[bob:1]]
==>[user:v[125], relationships:[bob:2]]
==>[user:v[126], relationships:[bob:1]]
As you can see, by can be chained (on some steps). The first by groups by vertex and the second by processes the grouped elements with a "local" groupCount.

CakePHP 3 Tree Behavior - numeric value out of range on save, delete, or move functions

I'm working with CakePHP 3.0's Tree Behavior and get an interesting issue when I'm trying to delete nodes. There are two types of nodes under a team - internal and external. The internal nodes are teams in the tree structure beneath me that I have complete access over; the external nodes are teams underneath me that have their own teams they've fleshed out, and as such I don't have permission to alter them (I only have permission to remove the entire top-level owner, but not modify the actual structure of their team).
So let's say I have a structure like this:
x
/ | \
* o o x o = internal
/ x = external
o * = node I'm trying to delete
/ \
o x
I have a delete function I've written to delete team nodes. If I delete the node with an asterisk by it, I expect it to delete every node underneath it. When I hit an external node (the x at the bottom) it should instead change the parent_id to null, since I don't have permission to delete somebody else's external team.
My logic is recursive and basically calls itself for every child under with logic similar to this:
if ($team->external === 1) {
$team->parent_id = null;
$this->save($team);
} else {
$this->delete($team);
}
I won't write out the whole function for the sake of brevity, but this functionality continues until all nodes have been deleted. Unfortunately, I get this error when I try and run the function:
"SQLSTATE[22003]: Numeric value out of range: 1690 BIGINT UNSIGNED value is out of range in '((`my_application`.`teams`.`lft` + 1182) * -(1))'"
So this looks like an issue with the lft or rght values (perhaps) and I'm confused as to why it's getting multiplied by a -1...either way, I'm not exactly sure what the case is here.
I get the same issue when I'm running other functions, like a move command. If I want to move a user underneath a different node, I might do something like this in my controller:
$team->parent_id = $this->request->data['id']; // let's say it's '21'
$this->Teams->save($team);
The same issue happens here where I get a 'numeric value out of range' issue. I've run $this->recover(); on the table a few times just to make sure the table's lft and rght values were all correct and the issue still occurs.
Does anybody have any idea?
Well, this is an implementation gotcha for the tree behavior in its current form. In order to speed up some operations, it relies on being able to set negative values to some of the nodes. Therefore, having an UNSIGNED column will not work correctly.
The easiest solution right now is to remove the UNSIGNED flag in the lft and rght columns.

Resources