Gremlin - Update values of multiple edges - graph-databases

I am using AWS Neptune and I have to modify a certain property of a set of EDGEs with specific values. I also need this done in a single transaction. In AWS Neptune, manual transaction logic using tx.commit() and tx.rollback() is not supported. Which means I have to do this operation in a single traversal.
If I was to modify properties of vertices instead of edges, I could have got it done with a query similar to the following one:
g.V(<id 1>).property('name', 'Marko').V(<id 2>).property('name', 'Stephen');
This is because it is possible to select vertices by id in mid traversal, i.e. the GraphTraversal class has V(String ... vertexIds) as a member function.
But this is not the same for the case of edges. I am not able to select edges this way because E(String ... edgeIds) is not a member function of the GraphTraversal class.
Can somebody suggest the correct way I can solve this problem?
Thank you.

Amazon Neptune engine 1.0.1.0.200463.0 added Support for Gremlin Sessions to enable executing multiple Gremlin traversals in a single transaction.
However, you can do it also with a single query like this:
g.E('id1', 'id2', 'id3').coalesce(
has(id, 'id1').property('name','marko'),
has(id, 'id2').property('name','stephan'),
has(id, 'id3').property('name','vadas'))

You can get the same result as a mid traversal E() using V().outE().hasId(<list of IDs>)

Related

How to use apache Gremlin constant in AWS Neptune database?

How to use apache gremlin constant in AWS Neptune database?
g.V().hasLabel('user').has('name', 'Thirumal1').coalesce(id(), constant("1"));
Not getting constant value in the output. The document says, need to use it with sack https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-step-support.html. How to use constant in AWS Neptune.
If the aim is to return a 1 if the vertex does not exist you need to use the fold().coalesce() pattern. As written in the question, the query will end before the coalesce if the has returns no results.
You could do something like this
g.V().hasLabel('user').
has('name', 'Thirumal1').
fold().
coalesce(unfold().id(), constant("1"))
In TinkerPop 3.6 a new mergeV step was added. Once database providers move up to that version, you will be able to use mergeV coupled with onCreate and onMatch.
The coalesce() step returns the first option that returns a value. In your query example the id() step always returns a value so it will never return the constant("1").
The gremlin step support you refer to concerns pushdown of operations to the datastore. Fully down that page it is shown that really support is lacking for just a few gremlin steps.

Neo4J: How do I check each disjoint subgraph in a Neo4J query?

After I query through my database using Neo4J, I get a bunch of disjoint subgraphs like 'islands of nodes'.
What I want though is to get the most recent node for each 'island' (I have date values on each node).
How do I go about doing that?
Firstly you need to calculate yours islands like you said.
To di it yYou can check the neo4j-graph-algo with the procedure algo.unionFind : https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_community_detection_connected_components
Then for each of your island, you have to order the nodes and take the first one.

Nested traversal gremlin query for Titan db

I am wondering how is possible to have a gremlin query which returns results in a nested format. Suppose there is property graph as follows:
USER and PAGE vertices with some properties such as AGE for USER vertex;
FOLLOW edge between USER and PAGE;
I am looking for a single efficient query which gives all Users with age greater than 20 years and all of the followed pages by those users. I can do that using a simple loop from the application side and per each iteration use a simple traversal query. Unfortunately, such solution is not efficient for me, since it will generate lots of queries and network latency could be huge in this case.
Not sure what your definition of "efficient" is, but keep in mind that this is a typical OLAP use-case and you shouldn't expect fast OLTP realtime responses.
That said, the query should be as simple as:
g.V().has("USER", "AGE", gt(20)).as("user").
map(out("FOLLOW").fold()).as("pages").
select("user", "pages")
A small example using the modern sample graph:
gremlin> g = TinkerFactory.createModern().traversal().withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().has("person", "age", gt(30)).as("user").
map(out("created").fold()).as("projects").
select("user","projects")
==>[user:v[6], projects:[v[3]]]
==>[user:v[4], projects:[v[5], v[3]]]
this is very easy:
g.V().label('user').has('age',gt(20))
.match(__.as('user').out('follows').as('page'))
.select('user','page')
just attention when you are using this query in gremlin, gremlin gives you null pointer exception you can use it in code and check if 'page' exist get that.

How to optimize graph traversals in ArangoDB?

I primarily intended to ask this question : "Is ArangoDB a true graph database ?"
But, this question would sound quite offending.
You, peoples at triAGENS, did a really great job in creating a "multi-paradigm" database.
As a user of PostgreSQL, PostGIS, MongoDB and Neo4J/Titan, I really appreciate to see an "all-in-one" solution :)
But the question remains, basically creating a graph in ArangoDB requires to create two separate collections : one for edges and one for vertices, thus, as far as I understand, it already means that vertices and related edges are not "physically" neighbors.
Moreover, even after creating appropriate index, I'm facing some serious performance issues when doing this kind of stuff in Gremlin
g.v('an_id').out('likes').in('likes').count()
Which returns a result after ~ 3 seconds (perceived time)
I assumed I poorly understood how Gremlin and Blueprint/ArangoDB worked so I tried to rewrite the same query using AQL :
LET lst = (FOR e1 in NEIGHBORS(vertices, edges, "an_id", "outbound", [ { "$label": "likes" } ] )
FOR e2 in NEIGHBORS(vertices, edges, e1.edge._to, "inbound", [ { "$label": "likes" } ] )
RETURN 1
)
RETURN length(lst)
Which gives me a delay of same order of magnitude.
If I tried to run the same query on a Titan or Neo4j database (with the very same data), queries returns almost immediately (perceived time : <200ms)
So it seems to me that ArangoDB graph features are a "smart graph layer" above a "traditionnal document database" but that ArangoDB is not a "native" graph database.
To confirm this feeling, I transform data to load it in PostgreSQL and run a query (with a multiple table JOIN as you can assume) and got similar (to ArangoDB) execution delays
Did I do something wrong (in AQL query) ?
Is there a way to optimize the database to get better traversal times ?
In PostgreSQL, conceptually, I would mix edge and node and use a CLUSTER clause to physically order data, does something similar can be done in ArangoDB ? (I assume that it would be hard, as it would involve to "interlace" edges and nodes, just an intuition)
i am a Core Developer of ArangoDB. Could you give me a bit more information ob the dimensions of data you are using?
Amount of vertices
Amount of edges
Then we can create our own setup with equal dimensions and optimize it.

How to use indexed properties of NodeModels in cypher queries of Neo4django?

I'm a newbie to Django as well as neo4j. I'm using Django 1.4.5, neo4j 1.9.2 and neo4django 0.1.8
I've created NodeModel for a person node and indexed it on 'owner' and 'name' properties. Here is my models.py:
from neo4django.db import models as models2
class person_conns(models2.NodeModel):
owner = models2.StringProperty(max_length=30,indexed=True)
name = models2.StringProperty(max_length=30,indexed=True)
gender = models2.StringProperty(max_length=1)
parent = models2.Relationship('self',rel_type='parent_of',related_name='parents')
child = models2.Relationship('self',rel_type='child_of',related_name='children')
def __unicode__(self):
return self.name
Before I connected to Neo4j server, I set auto indexing to True and and gave indexable keys in conf/neo4j.properties file as follows:
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=owner,name
# Enable auto-indexing for relationships, default is false
relationship_auto_indexing=true
# The relationship property keys to be auto-indexed, if enabled
relationship_keys_indexable=child_of,parent_of
I followed Neo4j: Step by Step to create an automatic index to update above file and manually create node_auto_index on neo4j server.
Below are the indexes created on neo4j server after executing syndb of django on neo4j database and manually creating auto indexes:
graph-person_conns lucene
{"to_lower_case":"true", "_blueprints:type":"MANUAL","type":"fulltext"}
node_auto_index lucene
{"_blueprints:type":"MANUAL", "type":"exact"}
As suggested in https://github.com/scholrly/neo4django/issues/123 I used connection.cypher(queries) to query the neo4j database
For Example:
listpar = connection.cypher("START no=node(*) RETURN no.owner?, no.name?",raw=True)
Above returns the owner and name of all nodes correctly. But when I try to query on indexed properties instead of 'number' or '*', as in case of:
listpar = connection.cypher("START no=node:node_auto_index(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives 0 rows.
listpar = connection.cypher("START no=node:graph-person_conns(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives
Exception Value:
Error [400]: Bad Request. Bad request syntax or unsupported method.
Invalid data sent: (' expected but-' found after graph
I tried other strings like name, person_conns instead of graph-person_conns but each time it gives error that the particular index does not exist. Am I doing a mistake while adding indexes?
My project mainly depends on filtering the nodes based on properties, so this part is really essential. Any pointers or suggestions would be appreciated. Thank you.
This is my first post on stackoverflow. So in case of any missing information or confusing statements please be patient. Thank you.
UPDATE:
Thank you for the help. For the benefit of others I would like to give example of how to use cypher queries to traverse/find shortest path between two nodes.
from neo4django.db import connection
results = connection.cypher("START source=node:`graph-person_conns`(person_name='s2sp1'),dest=node:`graph-person_conns`(person_name='s2c1') MATCH p=ShortestPath(source-[*]->dest) RETURN extract(i in nodes(p) : i.person_name), extract(j in rels(p) : type(j))")
This is to find shortest path between nodes named s2sp1 and s2c1 on the graph. Cypher queries are really cool and help traverse nodes limiting the hops, types of relations etc.
Can someone comment on the performance of this method? Also please suggest if there are any other efficient methods to access Neo4j from Django. Thank You :)
Hm, why are you using Cypher? neo4django QuerySets work just fine for the above if you set the properties to indexed=True (or not, it'll just be slower for those).
people = person_conns.objects.filter(name='n2')
The neo4django docs have some other querying examples, as do the Django docs. Neo4django executes those queries as Cypher on the backend- you really shouldn't need to drop down to writing the Cypher yourself unless you have a very particular traversal pattern or a performance issue.
Anyway, to more directly tackle your question- the last example you used needs backticks to escape the index name, like
listpar = connection.cypher("START no=node:`graph-person_conns`(name='s2') RETURN no.owner?, no.name?",raw=True)
The first example should work. One thought- did you flip the autoindexing on before or after saving the nodes you're searching for? If after, note that you'll have to manually reindex the nodes either using the Java API or by re-setting properties on the node, since it won't have been autoindexed.
HTH, and welcome to StackOverflow!

Resources