Can someone explain how to define multi column indexes in Grails? The documentation is at best sparse.
This for example does not seem to work at all:
http://grails.org/GORM+Index+definitions
I've had some luck with this, but the results seems random at best. Definitions that works in one domain class does not when applied to another (with different names of course).
http://www.grails.org/doc/1.1/guide/single.html#5.5.2.6%20Database%20Indices
Some working examples and explanations would be highly appreciated!
The solution that has worked for me for multi-column indexes is:
class ClassName {
String name
String description
String state
static mapping = {
name index: 'name_idx'
description index: 'name_idx'
state index: 'name_idx'
}
}
This creates an index called 'name_idx' with the three columns in the index.
Downside: the columns are listed in the index in alphabetical order, not the order that they were entered.
To make your index multi-column, list the columns with comma separator (note, no space after the comma, to avoid this bug. The second URL you point to hits the bug, as it says:
index:'Name_Idx, Address_Index'
with a space; it should work as
index:'Name_Idx,Address_Index'
The first URL you point to was a proposed change (I don't believe it's implemented currently and have no idea how likely it is to ever be).
AFAIK, the index closure shown here was never implemented, so those examples should be ignored (this page is for discussing possible implementations, rather than documenting an actual implementation).
The correct way to define a single-column index name_idx for a name property is
static mapping = {
name index:'name_idx'
}
Sorry, but I don't know how to define a multi-column index, try the Grails mailing list if you don't get an answer here. In the unlikely event that multi-column indices can't be declared directly in the domain classes, you could define them in an SQL file which creates them if they don't already exist (or drops and re-creates them). This SQL file could be executed by the init closure in Bootstrap.groovy
I needed to be able to control the order of the columns in my multi-column index and also make it unique. I worked around the GORM / Hibernate limitations by creating the index in Bootstrap using direct SQL:
class BootStrap {
DataSource dataSource
def init = { servletContext ->
if (!MyModel.count()) { // new database
createIndexes()
...
}
}
private void createIndexes() {
Sql sql = new Sql(dataSource)
sql.execute("create unique index my_index on my_model(col1,col2);")
}
Related
I would like to avoid costly repeated data base queries in Anylogic. I have seen the following thread in Stack Overflow What is the fastest way to look up continuous data on Anylogic (Java, SQL) where a simple three step answer is provided but I'm not sure what the second point of the three actually means:
Save all rows as instances of that class at model start-up into a map - you can use Origin/Destination as the key (use Anylogic's Pair object) and the class instance as the value
I have created a class that takes as inputs the information from each column of my database. I would now like to save each row as an instance of that class - is there an easy way to do this? I may be missing something simple as I'm new to Anylogic.
I'm also unsure of how to create a mapping, if anyone could add more detail to point 2 above I'd be very grateful!
this is effectively the best advice, you created the class, which is a great step, but now, one element of that class, will be used as the key... for example the name... for instance if your class has firstName as one variable and lastName as another variable, you will use a string that is the concatenation of firstName and lastName as your key. Of course any key is fine, assuming that it is unique for all your table. Also an integer as an id is ok too.
create a collection of type LinkedHashMap
Create a class (you did that)
Your collection will take as the key a String (first + last name) and as the value of the elment the class...
now, when you read your database you will have something like this:
for(Tuple t : yourQueryResults){
YourClass yc=new YourClass(t.get(db.var1),t.get(db.var2));
String totalName=t.get(db.first_name)+"_"+t.get(db.last_name);
yourCollection.put(totalName,yc);
}
Now every time you want to find someone with the a name, for example "John Doe", instead of making a query, you will do
yourCollection.get("John_Doe").theVarYouWant;
if you use an id instead of the name, you can set an int as the key, and then you will just do yourCollection.get(theId).theVarYouWant
In a legacy project we had issues where if a developer would forget a project_id in the query condition, rows for all projects would be shown - instead of the single project they are meant to see. For example for "Comments":
comments [id, project_id, message ]
If you forget to filter by project_id you would see all projects. This is caught by tests, sometimes not, but I would rather do a prevention - the dev should see straightaway "WRONG/Empty"!
To get around this, the product manager is insisting on separate tables for comments, like this:
project1_comments [id,message]
project2_comments [id,message]
Here if you forgot the project/table name, if something were to still pass tests and got deployed, you would get nothing or an error.
However the difficulty is then with associated tables. Example "Files" linked to "Comments":
files [ id, comment_id, path ]
3, 1, files/foo/bar
project1_comments
id | message
1 | Hello World
project2_comments
id | message
1 | Bye World
This then turns into a database per project, which seems overkill.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
Many thanks in advance.
In a legacy project we had issues where if a developer would forget a project_id in the query condition
CakePHP generates the join conditions based upon associations you define for the tables. They are automatic when you use contains and it's unlikely a developer would make such a mistake with CakePHP.
To get around this, the product manager is insisting on separate tables for comments, like this:
Don't do it. Seems like a really bad idea to me.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
The easiest solution is to just forbid all direct queries on the Comments table.
class Comments extends Table {
public function find($type = 'all', $options = [])
{
throw new \Cake\Network\Exception\ForbiddenException('Comments can not be used directly');
}
}
Afterwards only Comments read via an association will be allowed (associations always have valid join conditions), but think twice before doing this as I don't see any benefits in such a restriction.
You can't easily restrict direct queries on Comments to only those that contain a product_id in the where clause. The problem is that where clauses are an expression tree, and you'd have to traverse the tree and check all different kinds of expressions. It's a pain.
What I would do is restrict Comments so that product_id has to be passed as an option to the finder.
$records = $Comments->find('all', ['product_id'=>$product_id])->all();
What the above does is pass $product_id as an option to the default findAll method of the table. We can than override that methods and force product_id as a required option for all direct comment queries.
public function findAll(Query $query, array $options)
{
$product_id = Hash::get($options, 'product_id');
if (!$product_id) {
throw new ForbiddenException('product_id is required');
}
return $query->where(['product_id' => $product_id]);
}
I don't see an easy way to do the above via a behavior, because the where clause contains only expressions by the time the behavior is executed.
I have an Article type structured like this:
type Article struct {
Title string
Content string `datastore:",noindex"`
}
In an administrative portion of my site, I list all of my Articles. The only property I need in order to display this list is Title; grabbing the content of the article seems wasteful. So I use a projection query:
q := datastore.NewQuery("Article").Project("Title")
Everything works as expected so far. Now I decide I'd like to add two fields to Article so that some articles can be unlisted in the public article list and/or unviewable when access is attempted. Understanding the datastore to be schema-less, I think this might be very simple. I add the two new fields to Article:
type Article struct {
Title string
Content string `datastore:",noindex"`
Unlisted bool
Unviewable bool
}
I also add them to the projection query, since I want to indicate in the administrative article list when an article is publicly unlisted and/or unviewable:
q := datastore.NewQuery("Article").Project("Title", "Unlisted", "Unviewable")
Unfortunately, this only returns entries that have explicitly included Unlisted and Unviewable when Put into the datastore.
My workaround for now is to simply stop using a projection query:
q := datastore.NewQuery("Article")
All entries are returned, and the entries that never set Unlisted or Unviewable have them set to their zero value as expected. The downside is that the article content is being passed around needlessly.
In this case, that compromise isn't terrible, but I expect similar situations to arise in the future, and it could be a big deal not being able to use projection queries. Projections queries and adding new properties to datastore entries seem like they're not fitting together well. I want to make sure I'm not misunderstanding something or missing the correct way to do things.
It's not clear to me from the documentation that projection queries should behave this way (ignoring entries that don't have the projected properties rather than including them with zero values). Is this the intended behavior?
Are the only options in scenarios like this (adding new fields to structs / properties to entries) to either forgo projection queries or run some kind of "schema migration", Getting all entries and then Puting them back, so they now have zero-valued properties and can be projected?
Projection queries source the data for fields from the indexes not the entity, when you have added new properties pre-existing records do not appear in those indexes you are performing the project query on. They will need to be re-indexed.
You are asking for those specific properties and they don't exist hence the current behaviour.
You should probably think of a projection query as a request for entities with a value in a requested index in addition to any filter you place on a query.
I'm a newbie to Django as well as neo4j. I'm using Django 1.4.5, neo4j 1.9.2 and neo4django 0.1.8
I've created NodeModel for a person node and indexed it on 'owner' and 'name' properties. Here is my models.py:
from neo4django.db import models as models2
class person_conns(models2.NodeModel):
owner = models2.StringProperty(max_length=30,indexed=True)
name = models2.StringProperty(max_length=30,indexed=True)
gender = models2.StringProperty(max_length=1)
parent = models2.Relationship('self',rel_type='parent_of',related_name='parents')
child = models2.Relationship('self',rel_type='child_of',related_name='children')
def __unicode__(self):
return self.name
Before I connected to Neo4j server, I set auto indexing to True and and gave indexable keys in conf/neo4j.properties file as follows:
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=owner,name
# Enable auto-indexing for relationships, default is false
relationship_auto_indexing=true
# The relationship property keys to be auto-indexed, if enabled
relationship_keys_indexable=child_of,parent_of
I followed Neo4j: Step by Step to create an automatic index to update above file and manually create node_auto_index on neo4j server.
Below are the indexes created on neo4j server after executing syndb of django on neo4j database and manually creating auto indexes:
graph-person_conns lucene
{"to_lower_case":"true", "_blueprints:type":"MANUAL","type":"fulltext"}
node_auto_index lucene
{"_blueprints:type":"MANUAL", "type":"exact"}
As suggested in https://github.com/scholrly/neo4django/issues/123 I used connection.cypher(queries) to query the neo4j database
For Example:
listpar = connection.cypher("START no=node(*) RETURN no.owner?, no.name?",raw=True)
Above returns the owner and name of all nodes correctly. But when I try to query on indexed properties instead of 'number' or '*', as in case of:
listpar = connection.cypher("START no=node:node_auto_index(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives 0 rows.
listpar = connection.cypher("START no=node:graph-person_conns(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives
Exception Value:
Error [400]: Bad Request. Bad request syntax or unsupported method.
Invalid data sent: (' expected but-' found after graph
I tried other strings like name, person_conns instead of graph-person_conns but each time it gives error that the particular index does not exist. Am I doing a mistake while adding indexes?
My project mainly depends on filtering the nodes based on properties, so this part is really essential. Any pointers or suggestions would be appreciated. Thank you.
This is my first post on stackoverflow. So in case of any missing information or confusing statements please be patient. Thank you.
UPDATE:
Thank you for the help. For the benefit of others I would like to give example of how to use cypher queries to traverse/find shortest path between two nodes.
from neo4django.db import connection
results = connection.cypher("START source=node:`graph-person_conns`(person_name='s2sp1'),dest=node:`graph-person_conns`(person_name='s2c1') MATCH p=ShortestPath(source-[*]->dest) RETURN extract(i in nodes(p) : i.person_name), extract(j in rels(p) : type(j))")
This is to find shortest path between nodes named s2sp1 and s2c1 on the graph. Cypher queries are really cool and help traverse nodes limiting the hops, types of relations etc.
Can someone comment on the performance of this method? Also please suggest if there are any other efficient methods to access Neo4j from Django. Thank You :)
Hm, why are you using Cypher? neo4django QuerySets work just fine for the above if you set the properties to indexed=True (or not, it'll just be slower for those).
people = person_conns.objects.filter(name='n2')
The neo4django docs have some other querying examples, as do the Django docs. Neo4django executes those queries as Cypher on the backend- you really shouldn't need to drop down to writing the Cypher yourself unless you have a very particular traversal pattern or a performance issue.
Anyway, to more directly tackle your question- the last example you used needs backticks to escape the index name, like
listpar = connection.cypher("START no=node:`graph-person_conns`(name='s2') RETURN no.owner?, no.name?",raw=True)
The first example should work. One thought- did you flip the autoindexing on before or after saving the nodes you're searching for? If after, note that you'll have to manually reindex the nodes either using the Java API or by re-setting properties on the node, since it won't have been autoindexed.
HTH, and welcome to StackOverflow!
I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item