How to use apache Gremlin constant in AWS Neptune database? - database

How to use apache gremlin constant in AWS Neptune database?
g.V().hasLabel('user').has('name', 'Thirumal1').coalesce(id(), constant("1"));
Not getting constant value in the output. The document says, need to use it with sack https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-step-support.html. How to use constant in AWS Neptune.

If the aim is to return a 1 if the vertex does not exist you need to use the fold().coalesce() pattern. As written in the question, the query will end before the coalesce if the has returns no results.
You could do something like this
g.V().hasLabel('user').
has('name', 'Thirumal1').
fold().
coalesce(unfold().id(), constant("1"))
In TinkerPop 3.6 a new mergeV step was added. Once database providers move up to that version, you will be able to use mergeV coupled with onCreate and onMatch.

The coalesce() step returns the first option that returns a value. In your query example the id() step always returns a value so it will never return the constant("1").
The gremlin step support you refer to concerns pushdown of operations to the datastore. Fully down that page it is shown that really support is lacking for just a few gremlin steps.

Related

Gremlin - Update values of multiple edges

I am using AWS Neptune and I have to modify a certain property of a set of EDGEs with specific values. I also need this done in a single transaction. In AWS Neptune, manual transaction logic using tx.commit() and tx.rollback() is not supported. Which means I have to do this operation in a single traversal.
If I was to modify properties of vertices instead of edges, I could have got it done with a query similar to the following one:
g.V(<id 1>).property('name', 'Marko').V(<id 2>).property('name', 'Stephen');
This is because it is possible to select vertices by id in mid traversal, i.e. the GraphTraversal class has V(String ... vertexIds) as a member function.
But this is not the same for the case of edges. I am not able to select edges this way because E(String ... edgeIds) is not a member function of the GraphTraversal class.
Can somebody suggest the correct way I can solve this problem?
Thank you.
Amazon Neptune engine 1.0.1.0.200463.0 added Support for Gremlin Sessions to enable executing multiple Gremlin traversals in a single transaction.
However, you can do it also with a single query like this:
g.E('id1', 'id2', 'id3').coalesce(
has(id, 'id1').property('name','marko'),
has(id, 'id2').property('name','stephan'),
has(id, 'id3').property('name','vadas'))
You can get the same result as a mid traversal E() using V().outE().hasId(<list of IDs>)

Neo4j output format

After working with neo4j and now coming to the point of considering to make my own entity manager (object manager) to work with the fetched data in the application, i wonder about neo4j's output format.
When i run a query it's always returned as tabular data. Why is this??
Sure tables keep a big place in data and processing, but it seems so strange that a graph database can only output in this format.
Now when i want to create an object graph in my application i would have to hydrate all the objects and this is not really good for performance and doesn't leverage true graph performace.
Consider MATCH (A)-->(B) RETURN A, B when there is one A and three B's, it would return:
A B
1 1
1 2
1 3
That's the same A passed down 3 times over the database connection, while i only need it once and i know this before the data is fetched.
Something like this seems great http://nigelsmall.com/geoff
a load2neo is nice, a load-from-neo would also be nice! either in the geoff format or any other formats out there https://gephi.org/users/supported-graph-formats/
Each language could then implement it's own functions to create the objects directly.
To clarify:
Relations between nodes are lost in tabular data
Redundant (non-optimal) format for graphs
Edges (relations) and vertices (nodes) are usually not in the same table. (makes queries more complex?)
Another consideration (which might deserve it's own post), what's a good way to model relations in an object graph? As objects? or as data/method inside the node objects?
#Kikohs
Q: What do you mean by "Each language could then implement it's own functions to create the objects directly."?
A: With an (partial) graph provided by the database (as result of a query) a language as PHP could provide a factory method (in C preferably) to construct the object graph (this is usually an expensive operation). But only if the object graph is well defined in a standard format (because this function should be simple and universal).
Q: Do you want to export the full graph or just the result of a query?
A: The result of a query. However a query like MATCH (n) OPTIONAL MATCH (n)-[r]-() RETURN n, r should return the full graph.
Q: you want to dump to the disk the subgraph created from the result of a query ?
A: No, existing interfaces like REST are prefered to get the query result.
Q: do you want to create the subgraph which comes from a query in memory and then request it in another language ?
A: no i want the result of the query in another format then tabular (examples mentioned)
Q: You make a query which only returns the name of a node, in this case, would you like to get the full node associated or just the name ? Same for the edges.
A: Nodes don't have names. They have properties, labels and relations. I would like enough information to retrieve A) The node ID, it's labels, it's properties and B) the relation to other nodes which are in the same result.
Note that the first part of the question is not a concrete "how-to" question, rather "why is this not possible?" (or if it is, i like to be proven wrong on this one). The second is a real "how-to" question, namely "how to model relations". The two questions have in common that they both try to find the answer to "how to get graph data efficiently in PHP."
#Michael Hunger
You have a point when you say that not all result data can be expressed as an object graph. It reasonable to say that an alternative output format to a table would only be complementary to the table format and not replacing it.
I understand from your answer that the natural (rawish) output format from the database is the result format with duplicates in it ("streams the data out as it comes"). I that case i understand that it's now left to an alternative program (of the dev stack) to do the mapping. So my conclusion on neo4j implementing something like this:
Pro's - not having to do this in every implementation language (of the application)
Con's - 1) no application specific mapping is possible, 2) no performance gain if implementation language is fast
"Even if you use geoff, graphml or the gephi format you have to keep all the data in memory to deduplicate the results."
I don't understand this point entirely, are you saying that these formats are no able to hold deduplicated results (in certain cases)?? So infact that there is no possible textual format with which a graph can be described without duplication??
"There is also the questions on what you want to include in your output?"
I was under the assumption that the cypher language was powerful enough to specify this in the query. And so the output format would have whatever the database can provide as result.
"You could just return the paths that you get, which are unique paths through the graph in themselves".
Useful suggestion, i'll play around with this idea :)
"The dump command of the neo4j-shell uses the approach of pulling the cypher results into an in-memory structure, enriching it".
Does the enriching process fetch additional data from the database or is the data already contained in the initial result?
There is more to it.
First of all as you said tabular results from queries are really commonplace and needed to integrate with other systems and databases.
Secondly oftentimes you don't actually return raw graph data from your queries, but aggregated, projected, sliced, extracted information out of your graph. So the relationships to the original graph data are already lost in most of the results of queries I see being used.
The only time that people need / use the raw graph data is when to export subgraph-data from the database as a query result.
The problem of doing that as a de-duplicated graph is that the db has to fetch all the result data data in memory first to deduplicate, extract the needed relationships etc.
Normally it just streams the data out as it comes and uses little memory with that.
Even if you use geoff, graphml or the gephi format you have to keep all the data in memory to deduplicate the results (which are returned as paths with potential duplicate nodes and relationships).
There is also the questions on what you want to include in your output? Just the nodes and rels returned? Or additionally all the other rels between the nodes that you return? Or all the rels of the returned nodes (but then you also have to include the end-nodes of those relationships).
You could just return the paths that you get, which are unique paths through the graph in themselves:
MATCH p = (n)-[r]-(m)
WHERE ...
RETURN p
Another way to address this problem in Neo4j is to use sensible aggregations.
E.g. what you can do is to use collect to aggregate data per node (i.e. kind of subgraphs)
MATCH (n)-[r]-(m)
WHERE ...
RETURN n, collect([r,type(r),m])
or use the new literal map syntax (Neo4j 2.0)
MATCH (n)-[r]-(m)
WHERE ...
RETURN {node: n, neighbours: collect({ rel: r, type: type(r), node: m})}
The dump command of the neo4j-shell uses the approach of pulling the cypher results into an in-memory structure, enriching it and then outputting it as cypher create statement(s).
A similar approach can be used for other output formats too if you need it. But so far there hasn't been the need.
If you really need this functionality it makes sense to write a server-extension that uses cypher for query specification, but doesn't allow return statements. Instead you would always use RETURN *, aggregate the data into an in-memory structure (SubGraph in the org.neo4j.cypher packages). And then render it as a suitable format (e.g. JSON or one of those listed above).
These could be a starting points for that:
https://github.com/jexp/cypher-rs
https://github.com/jexp/cypher_websocket_endpoint
https://github.com/neo4j-contrib/rabbithole/blob/master/src/main/java/org/neo4j/community/console/SubGraph.java#L123
There are also other efforts, like GraphJSON from GraphAlchemist: https://github.com/GraphAlchemist/GraphJSON
And the d3 json format is also pretty useful. We use it in the neo4j console (console.neo4j.org) to return the graph visualization data that is then consumed by d3 directly.
I've been working with neo4j for a while now and I can tell you that if you are concerned about memory and performances you should drop cypher at all, and use indexes and the other graph-traversal methods instead (e.g. retrieve all the relationships of a certain type from or to a start node, and then iterate over the found nodes).
As the documentation says, Cypher is not intended for in-app usage, but more as a administration tool. Furthermore, in production-scale environments, it is VERY easy to crash the server by running the wrong query.
In second place, there is no mention in the docs of an API method to retrieve the output as a graph-like structure. You will have to process the output of the query and build it.
That said, in the example you give you say that there is only one A and that you know it before the data is fetched, so you don't need to do:
MATCH (A)-->(B) RETURN A, B
but just
MATCH (A)-->(B) RETURN B
(you don't need to receive A three times because you already know these are the nodes connected with A)
or better (if you need info about the relationships) something like
MATCH (A)-[r]->(B) RETURN r

GQL query with "like" operator [duplicate]

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

What's your experience developing on Google App Engine?

Is GQL easy to learn for someone who knows SQL? How is Django/Python? Does App Engine really make scaling easy? Is there any built-in protection against "GQL Injections"? And so on...
I'd love to hear the not-so-obvious ups and downs of using app engine.
Cheers!
My experience with google app engine has been great, and the 1000 result limit has been removed, here is a link to the release notes:
app-engine release notes
No more 1000 result limit - That's
right: with addition of Cursors and
the culmination of many smaller
Datastore stability and performance
improvements over the last few months,
we're now confident enough to remove
the maximum result limit altogether.
Whether you're doing a fetch,
iterating, or using a Cursor, there's
no limits on the number of results.
The most glaring and frustrating issue is the datastore api, which looks great and is very well thought out and easy to work with if you are used to SQL, but has a 1000 row limit across all query resultsets, and you can't access counts or offsets beyond that. I've run into weirder issues, with not actually being able to add or access data for a model once it goes beyond 1000 rows.
See the Stack Overflow discussion about the 1000 row limit
Aral Balkan wrote a really good summary of this and other problems
Having said that, app engine is a really great tool to have at ones disposal, and I really enjoy working with it. It's perfect for deploying micro web services (eg: json api's) to use in other apps.
GQL is extremely simple - it's a subset of the SQL 'SELECT' statement, nothing more. It's only a convenience layer over the top of the lower-level APIs, though, and all the parsing is done in Python.
Instead, I recommend using the Query API, which is procedural, requires no run-time parsing, and makes 'GQL injection' vulnerabilities totally impossible (though they are impossible in properly written GQL anyway). The Query API is very simple: Call .all() on a Model class, or call db.Query(modelname). The Query object has .filter(field_and_operator, value), .order(field_and_direction) and .ancestor(entity) methods, in addition to all the facilities GQL objects have (.get(), .fetch(), .count()), etc.) Each of the Query methods returns the Query object itself for convenience, so you can chain them:
results = MyModel.all().filter("foo =", 5).order("-bar").fetch(10)
Is equivalent to:
results = MyModel.gql("WHERE foo = 5 ORDER BY bar DESC LIMIT 10").fetch()
A major downside when working with AppEngine was the 1k query limit, which has been mentioned in the comments already. What I haven't seen mentioned though is the fact that there is a built-in sortable order, with which you can work around this issue.
From the appengine cookbook:
def deepFetch(queryGen,key=None,batchSize = 100):
"""Iterator that yields an entity in batches.
Args:
queryGen: should return a Query object
key: used to .filter() for __key__
batchSize: how many entities to retrieve in one datastore call
Retrieved from http://tinyurl.com/d887ll (AppEngine cookbook).
"""
from google.appengine.ext import db
# AppEngine will not fetch more than 1000 results
batchSize = min(batchSize,1000)
query = None
done = False
count = 0
if key:
key = db.Key(key)
while not done:
print count
query = queryGen()
if key:
query.filter("__key__ > ",key)
results = query.fetch(batchSize)
for result in results:
count += 1
yield result
if batchSize > len(results):
done = True
else:
key = results[-1].key()
The above code together with Remote API (see this article) allows you to retrieve as many entities as you need.
You can use the above code like this:
def allMyModel():
q = MyModel.all()
myModels = deepFetch(allMyModel)
At first I had the same experience as others who transitioned from SQL to GQL -- kind of weird to not be able to do JOINs, count more than 1000 rows, etc. Now that I've worked with it for a few months I absolutely love the app engine. I'm porting all of my old projects onto it.
I use it to host several high-traffic web applications (at peak time one of them gets 50k hits a minute.)
Google App Engine doesn't use an actual database, and apparently uses some sort of distributed hash map. This will lend itself to some different behaviors that people who are accustomed to SQL just aren't going to see at first. So for example getting a COUNT of items in regular SQL is expected to be a fast operation, but with GQL it's just not going to work the same way.
Here are some more issues:
http://blog.burnayev.com/2008/04/gql-limitations.html
In my personal experience, it's an adjustment, but the learning curve is fine.

Google App Engine: Is it possible to do a Gql LIKE query?

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Resources