How to use indexed properties of NodeModels in cypher queries of Neo4django? - django-models

I'm a newbie to Django as well as neo4j. I'm using Django 1.4.5, neo4j 1.9.2 and neo4django 0.1.8
I've created NodeModel for a person node and indexed it on 'owner' and 'name' properties. Here is my models.py:
from neo4django.db import models as models2
class person_conns(models2.NodeModel):
owner = models2.StringProperty(max_length=30,indexed=True)
name = models2.StringProperty(max_length=30,indexed=True)
gender = models2.StringProperty(max_length=1)
parent = models2.Relationship('self',rel_type='parent_of',related_name='parents')
child = models2.Relationship('self',rel_type='child_of',related_name='children')
def __unicode__(self):
return self.name
Before I connected to Neo4j server, I set auto indexing to True and and gave indexable keys in conf/neo4j.properties file as follows:
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=owner,name
# Enable auto-indexing for relationships, default is false
relationship_auto_indexing=true
# The relationship property keys to be auto-indexed, if enabled
relationship_keys_indexable=child_of,parent_of
I followed Neo4j: Step by Step to create an automatic index to update above file and manually create node_auto_index on neo4j server.
Below are the indexes created on neo4j server after executing syndb of django on neo4j database and manually creating auto indexes:
graph-person_conns lucene
{"to_lower_case":"true", "_blueprints:type":"MANUAL","type":"fulltext"}
node_auto_index lucene
{"_blueprints:type":"MANUAL", "type":"exact"}
As suggested in https://github.com/scholrly/neo4django/issues/123 I used connection.cypher(queries) to query the neo4j database
For Example:
listpar = connection.cypher("START no=node(*) RETURN no.owner?, no.name?",raw=True)
Above returns the owner and name of all nodes correctly. But when I try to query on indexed properties instead of 'number' or '*', as in case of:
listpar = connection.cypher("START no=node:node_auto_index(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives 0 rows.
listpar = connection.cypher("START no=node:graph-person_conns(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives
Exception Value:
Error [400]: Bad Request. Bad request syntax or unsupported method.
Invalid data sent: (' expected but-' found after graph
I tried other strings like name, person_conns instead of graph-person_conns but each time it gives error that the particular index does not exist. Am I doing a mistake while adding indexes?
My project mainly depends on filtering the nodes based on properties, so this part is really essential. Any pointers or suggestions would be appreciated. Thank you.
This is my first post on stackoverflow. So in case of any missing information or confusing statements please be patient. Thank you.
UPDATE:
Thank you for the help. For the benefit of others I would like to give example of how to use cypher queries to traverse/find shortest path between two nodes.
from neo4django.db import connection
results = connection.cypher("START source=node:`graph-person_conns`(person_name='s2sp1'),dest=node:`graph-person_conns`(person_name='s2c1') MATCH p=ShortestPath(source-[*]->dest) RETURN extract(i in nodes(p) : i.person_name), extract(j in rels(p) : type(j))")
This is to find shortest path between nodes named s2sp1 and s2c1 on the graph. Cypher queries are really cool and help traverse nodes limiting the hops, types of relations etc.
Can someone comment on the performance of this method? Also please suggest if there are any other efficient methods to access Neo4j from Django. Thank You :)

Hm, why are you using Cypher? neo4django QuerySets work just fine for the above if you set the properties to indexed=True (or not, it'll just be slower for those).
people = person_conns.objects.filter(name='n2')
The neo4django docs have some other querying examples, as do the Django docs. Neo4django executes those queries as Cypher on the backend- you really shouldn't need to drop down to writing the Cypher yourself unless you have a very particular traversal pattern or a performance issue.
Anyway, to more directly tackle your question- the last example you used needs backticks to escape the index name, like
listpar = connection.cypher("START no=node:`graph-person_conns`(name='s2') RETURN no.owner?, no.name?",raw=True)
The first example should work. One thought- did you flip the autoindexing on before or after saving the nodes you're searching for? If after, note that you'll have to manually reindex the nodes either using the Java API or by re-setting properties on the node, since it won't have been autoindexed.
HTH, and welcome to StackOverflow!

Related

Performance issue with django exclude

I have a Django 1.8 application, and I am using an MsSQL database, with pyodbc as the db backend (using "django-pyodbc-azure" module).
I have the following models:
class Branch(models.Model):
name = models.CharField(max_length=30)
startTime = models.DateTimeField()
class Device(models.Model):
uid = models.CharField(max_length=100, primary_key=True)
type = models.CharField(max_length=20)
firstSeen = models.DateTimeField()
lastSeen = models.DateTimeField()
class Session(models.Model):
device = models.ForeignKey(Device)
branch = models.ForeignKey(Branch)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
I need to query the session model, and I want to exclude some records with specific device values. So I issue the following query:
sessionCount = Session.objects.filter(branch=branch)
.exclude(device__in=badDevices)
.filter(end__gte=F('start')+timedelta(minutes=30)).count()
badDevices is a pre-filled list of device ids with around 60 items.
badDevices = ['id-1', 'id-2', ...]
This query takes around 1.5 seconds to complete. If I remove the exclude from the query, it takes around 250 miliseconds.
I printed the generated sql for this queryset, and tried it in my database client. There, both versions executed in around 250 miliseconds.
This is the generated SQL:
SELECT [session].[id], [session].[device_id], [session].[branch_id], [session].[start], [session].[end]
FROM [session]
WHERE ([session].[branch_id] = my-branch-id AND
NOT ([session].[device_id] IN ('id-1', 'id-2', 'id-3',...)) AND
DATEPART(dw, [session].[start]) = 1
AND [session].[end] IS NOT NULL AND
[session].[end] >= ((DATEADD(second, 600, CAST([session].[start] AS datetime)))))
So, using the exclude in database level doesn't seem to be affecting the query performance, but in django, the query runs 6 times slower if I add the exclude part. What could be causing this?
The general issue seems to be that django is doing some extra work to prepare the exclude clause. After that step and by the time the SQL has been generated and sent to the database, there isn't anything interesting happening on the django side that could cause such a significant delay.
In your case, one thing that might be causing this is some kind of pre-processing of badDevices. If, for instance, badDevices is a QuerySet then django might be executing the badDevices query just to prepare the actual query's SQL. Possibly something similar might be happening in the case where device has a non-default primary key.
The other thing might delay the SQL preparation is of course django-pyodbc-azure. Maybe it's doing something strange while compiling the query and it becomes a bottleneck.
This is all wild speculation though, so if you're still having this issue then post the Device and Branch models as well, the exact content of badDevices and the SQL generated from the queries. Then maybe some scenarios can be at least eliminated.
EDIT: I think it must be the Device.uid field. Possibly django or pyodbc is getting confused by the non-default primary key and is fetching all the devices while generating the query. Try two things:
Replace device__in with device_id__in, device__pk__in and device__uid__in and check each one again. Maybe a more explicit query will be easier for django to translate into SQL. You can even try replacing branch with branch_id, just in case.
If the above doesn't work, try replacing the exclude expression with a raw SQL where clause:
# add quotes (because of the hyphens) & join
badDevicesIdString = ", ".join(["'%s'" % id for id in badDevices])
# Replaces .exclude()
... .extra(where=['device_id NOT IN (%s)' % badDevicesIdString])
If neither works, then most likely the problem is with the whole query and not just exclude. There are some more options in that case but try the above first and I will update my answer later if necessary.
Just want to share a similar problem that I had with MySQL and exclude clauses performance and how it was fixed.
When running the exclude clause, the list with the "in" lookup was actually a Queryset that I got using values_list method. Checking the exclude query executed by MySQL, the "in" objects were not values but actually another query. This behavior was impacting performance on specific large queries.
To fix that, instead of passing the queryset, I flat it out in a python list of values. By doing that, each value is passed as an argument inside the in lookup and the performance was really improved.

pull Drupal field values with db_query() or db_select()

I've created a content type in Drupal 7 with 5 or 6 fields. Now I want to use a function to query them in a hook_view call back. I thought I would query the node table but all I get back are the nid and title. How do I get back the values for my created fields using the database abstraction API?
Drupal stores the fields in other tables and can automatically join them in. The storage varies depending on how the field is configured so the easiest way to access them is by using an EntityFieldQuery. It'll handle the complexity of joining all your fields in. There's some good examples of how to use it here: http://drupal.org/node/1343708
But if you're working in hook_view, you should already be able access the values, they're loaded into the $node object that's passed in as a parameter. Try running:
debug($node);
In your hook and you should see all the properties.
If you already known the ID of the nodes (nid) you want to load, you should use the node_load_multiple() to load them. This will load the complete need with all fields value. To search the node id, EntityFieldQuery is the recommended way but it has some limitations. You can also use the database API to query the node table for the nid (and revision ID, vid) of your nodes, then load them using node_load_multiple().
Loading a complete load can have performance impacts since it will load way more data than what you need. If this prove to be an issue, you can either try do directly access to field storage tables (if your fields values are stored in your SQL database). The schema of these tables is buld dynamicaly depedning on the fields types, cardinality and other settings. You will have to dig into your database schema to figure it out. And it will probably change as soon as you change something on your fields.
Another solution, is to build stub node entities and to use field_attach_load() with a $options['field_id'] value to only load the value of a specific field. But this require a good knowledge and understanding of the Field API.
See How to use EntityFieldQuery article in Drupal Community Documentation.
Creating A Query
Here is a basic query looking for all articles with a photo that are
tagged as a particular faculty member and published this year. In the
last 5 lines of the code below, the $result variable is populated with
an associative array with the first key being the entity type and the
second key being the entity id (e.g., $result['node'][12322] = partial
node data). Note the $result won't have the 'node' key when it's
empty, thus the check using isset, this is explained here.
Example:
<?php
$query = new EntityFieldQuery();
$query->entityCondition('entity_type', 'node')
->entityCondition('bundle', 'article')
->propertyCondition('status', 1)
->fieldCondition('field_news_types', 'value', 'spotlight', '=')
->fieldCondition('field_photo', 'fid', 'NULL', '!=')
->fieldCondition('field_faculty_tag', 'tid', $value)
->fieldCondition('field_news_publishdate', 'value', $year. '%', 'like')
->fieldOrderBy('field_photo', 'fid', 'DESC')
->range(0, 10)
->addMetaData('account', user_load(1)); // Run the query as user 1.
$result = $query->execute();
if (isset($result['node'])) {
$news_items_nids = array_keys($result['node']);
$news_items = entity_load('node', $news_items_nids);
}
?>
Other resources
EntityFieldQuery on api.drupal.org
Building Energy.gov without Views

SOLR (3.1+) - Multiple Spatial Queries with OR in Same Request

Is it possible to conduct multiple spatial queries within the same SOLR (3.1+) request?
We currently have a need to allow user to search for inventory with a location of their choice via a frontend search form. But we want to also add another spatial search behind the scenes so it will include more inventory. The resulting search would result in a venn diagram type of search.
Edit 10.4.2011
Example construct: q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20_query_:(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
The above construct does not work, but hopefully demonstrates what I am trying to accomplish.
This is old, but it doesn't seem like it ever got a full answer. I had the same issue and found that this syntax works:
q =*:*& fq = (({
!geofilt sfield = Location pt = 40.68063802521456,
-74.00390625 d = 80.4672
}
AND ClientId : "client1")OR({
!geofilt sfield = Location pt = 36.1146460,
-115.1728160 d = 80.4672
}
AND ClientId : "client2"))
It looks like, you like to run N querys in one request in order to get one result set per query?!
So Field Collapsing ( http://wiki.apache.org/solr/FieldCollapsing ) is what you are looking for. Unfortunately FieldCollapsing is only available from 3.3.
Depending on your needs, maybe counted results from different faceted searches could be also useful?!
What if you moved your second location query into an additional filter query, like below:
q=*:*&fq={!geofilt}&sfield=Location&(ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)&fq={!geofilt}&sfield=Location&(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672)
Will that provide the results that you are looking for? It might end up being too limiting, but thought it was worth trying.
You might also try:
q=*:*&fq={!geofilt}&sfield=Location&((ClientId:"client1"&pt=40.68063802521456,-74.00390625&d=80.4672)%20OR%20(ClientId:"client2"&pt=36.1146460,-115.1728160&d=80.4672))

GQL query with "like" operator [duplicate]

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Google App Engine: Is it possible to do a Gql LIKE query?

Simple one really. In SQL, if I want to search a text field for a couple of characters, I can do:
SELECT blah FROM blah WHERE blah LIKE '%text%'
The documentation for App Engine makes no mention of how to achieve this, but surely it's a common enough problem?
BigTable, which is the database back end for App Engine, will scale to millions of records. Due to this, App Engine will not allow you to do any query that will result in a table scan, as performance would be dreadful for a well populated table.
In other words, every query must use an index. This is why you can only do =, > and < queries. (In fact you can also do != but the API does this using a a combination of > and < queries.) This is also why the development environment monitors all the queries you do and automatically adds any missing indexes to your index.yaml file.
There is no way to index for a LIKE query so it's simply not available.
Have a watch of this Google IO session for a much better and more detailed explanation of this.
i'm facing the same problem, but i found something on google app engine pages:
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2",
"abc",
u"abc" + u"\ufffd")
This matches every MyModel entity with a string property prop that begins with the characters abc. The unicode string u"\ufffd" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html
maybe this could do the trick ;)
Altough App Engine does not support LIKE queries, have a look at the properties ListProperty and StringListProperty. When an equality test is done on these properties, the test will actually be applied on all list members, e.g., list_property = value tests if the value appears anywhere in the list.
Sometimes this feature might be used as a workaround to the lack of LIKE queries. For instance, it makes it possible to do simple text search, as described on this post.
You need to use search service to perform full text search queries similar to SQL LIKE.
Gaelyk provides domain specific language to perform more user friendly search queries. For example following snippet will find first ten books sorted from the latest ones with title containing fern
and the genre exactly matching thriller:
def documents = search.search {
select all from books
sort desc by published, SearchApiLimits.MINIMUM_DATE_VALUE
where title =~ 'fern'
and genre = 'thriller'
limit 10
}
Like is written as Groovy's match operator =~.
It supports functions such as distance(geopoint(lat, lon), location) as well.
App engine launched a general-purpose full text search service in version 1.7.0 that supports the datastore.
Details in the announcement.
More information on how to use this: https://cloud.google.com/appengine/training/fts_intro/lesson2
Have a look at Objectify here , it is like a Datastore access API. There is a FAQ with this question specifically, here is the answer
How do I do a like query (LIKE "foo%")
You can do something like a startWith, or endWith if you reverse the order when stored and searched. You do a range query with the starting value you want, and a value just above the one you want.
String start = "foo";
... = ofy.query(MyEntity.class).filter("field >=", start).filter("field <", start + "\uFFFD");
Just follow here:
init.py#354">http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/search/init.py#354
It works!
class Article(search.SearchableModel):
text = db.TextProperty()
...
article = Article(text=...)
article.save()
To search the full text index, use the SearchableModel.all() method to get an
instance of SearchableModel.Query, which subclasses db.Query. Use its search()
method to provide a search query, in addition to any other filters or sort
orders, e.g.:
query = article.all().search('a search query').filter(...).order(...)
I tested this with GAE Datastore low-level Java API. Me and works perfectly
Query q = new Query(Directorio.class.getSimpleName());
Filter filterNombreGreater = new FilterPredicate("nombre", FilterOperator.GREATER_THAN_OR_EQUAL, query);
Filter filterNombreLess = new FilterPredicate("nombre", FilterOperator.LESS_THAN, query+"\uFFFD");
Filter filterNombre = CompositeFilterOperator.and(filterNombreGreater, filterNombreLess);
q.setFilter(filter);
In general, even though this is an old post, a way to produce a 'LIKE' or 'ILIKE' is to gather all results from a '>=' query, then loop results in python (or Java) for elements containing what you're looking for.
Let's say you want to filter users given a q='luigi'
users = []
qry = self.user_model.query(ndb.OR(self.user_model.name >= q.lower(),self.user_model.email >= q.lower(),self.user_model.username >= q.lower()))
for _qry in qry:
if q.lower() in _qry.name.lower() or q.lower() in _qry.email.lower() or q.lower() in _qry.username.lower():
users.append(_qry)
It is not possible to do a LIKE search on datastore app engine, how ever creating an Arraylist would do the trick if you need to search a word in a string.
#Index
public ArrayList<String> searchName;
and then to search in the index using objectify.
List<Profiles> list1 = ofy().load().type(Profiles.class).filter("searchName =",search).list();
and this will give you a list with all the items that contain the world you did on the search
If the LIKE '%text%' always compares to a word or a few (think permutations) and your data changes slowly (slowly means that it's not prohibitively expensive - both price-wise and performance-wise - to create and updates indexes) then Relation Index Entity (RIE) may be the answer.
Yes, you will have to build additional datastore entity and populate it appropriately. Yes, there are some constraints that you will have to play around (one is 5000 limit on the length of list property in GAE datastore). But the resulting searches are lightning fast.
For details see my RIE with Java and Ojbectify and RIE with Python posts.
"Like" is often uses as a poor-man's substitute for text search. For text search, it is possible to use Whoosh-AppEngine.

Resources