ndb retrieving entity key by ID without parent - google-app-engine

I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.

I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.

Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.

I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()

My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item

Related

How NamespaceManager and Query by key works together in objectify

I have two organisation in my datastore inside their own namespace. Lets say organisation1 present inside namespace1 and organisation2 present inside namespace2. I am retrieving organisation by its web-safe-key. lets say that web-safe-key of organisation1 is orgWebSafeKey1 and web-safe-key of organisation2 is orgWebSafeKey2. I am using following code to get an organisation:
NamespaceManager.set("namespace1");
Organisation organisation = (Organisation) ofy().load().key(Key.create(orgWebSafeKey1)).now();
above code works as I expected because organisation1 is present inside namespace1 and I am trying get that organisation in its namespace.
But if I just change the websafekey of the organisation then according to my expectaion below query should result "null" organisation because there is no organisation with key orgWebSafeKey2 inside namespace1. But practically it is giving me organisation2.
NamespaceManager.set("namespace1");
Organisation organisation = (Organisation) ofy().load().key(Key.create(orgWebSafeKey2)).now();
If the above query result is correct and expected according to objectify and datastore then can I assume that query by key works globally , across all the namespaces?
I also want confirmation that in this case Key.create(orgWebSafeKey2) will not change the namespace of the key? and query is running according to the namespace of the key not by NamespaceManager.set("namespace1")?
A Datastore Key contains the following components:
Project/App ID
Namespace
Entity Path (Ancestor Kind + ID/Name(zero or more), Final Entity Kind + ID/Name)
Since namespace is part of the key, lookup of an entity by Key always finds the right entity regardless of the namespace set by the NamespaceManager. In other words, a Key is a GUID that uniquely identifies an entity across all apps/projects.
Refer to the below link for more details/answers for your questions:
https://cloud.google.com/appengine/docs/java/multitenancy/multitenancy#Java_Using_namespaces_with_the_Datastore

JPA search by Key without Knowing Parent Key

Ok so I have an application that uses GAE and consequently the datastore.
Say I have multiple companies A, B and C and I have within each company Employees X,Y and Z. The relationship between a company and employee will be OneToMany, with the company being the owner. This results in the Company Key being of the form
long id = 4504699138998272; // Random Example
Key CompanyKey = KeyFactory.createKey(Company.class.getSimpleName(), id);
and the employee key would be of the form
long id2 = 5630599045840896;
Key EmployeeKey = KeyFactory.createKey(CompanyKey,Employee.class.getSimpleName(),id2);
all fine and well and there is no problem, until in the front end, during jsp representation. Sometimes I would need to generate a report, or open an Employees profile, in which case the div containing his information would get an id as follows
<div class="employeeInfo" id="<%=employee.getKey().getId()%>" > .....</div>
and this div has an onclick / submit event, that will ajax the new modifications to the employee profile to servelet, at which point I have to specify the primary key of the employee, (which I thought I could easily get from the div id), but it didnt work server side.
The problem is I know the Employees String portion of the Key and the long portion, but not the Parent key. To save time I tried this and it didnt work
Key key = KeyFactory.creatKey(Employee.class.getSimpleName(); id);
Employee X = em.find(Employee.class,key);
X is always returned null.
I would really appreciate any idea of how to find or "query" Entities by keys without knowing their parents key (as I would hate having to re-adjust Entity classes)
Thanks alot !!
An Entity key and its parents cannot be separated. It's called ancestor path, a chain composed of entity kinds and ids.
So, in your example ancestor paths will look like this:
CompanyKey: ("Company", 4504699138998272)
EmployeeKey: ("Company", 4504699138998272, "Employee", 5630599045840896)
A key composed only of ("Employee", 5630599045840896) is a completely different one comparing to the EmployeeKey even though both keys end with the same values. Think of concatenating elements into a single "string" and comparing final values, they will never match.
One thing you can do is use encoded keys instead of their id values:
String encodedKey = KeyFactory.keyToString(EmployeeKey);
Key decodedKey = KeyFactory.stringToKey(encodedKey);
decodedKey.equals(EmployeeKey); // true
More about Ancestor Paths:
https://developers.google.com/appengine/docs/java/datastore/entities#Java_Ancestor_paths
KeyFactory Java doc:
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/KeyFactory#keyToString(com.google.appengine.api.datastore.Key)

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

How to use ndb key with integer_id?

I see the document
https://developers.google.com/appengine/docs/python/ndb/keyclass#Key_integer_id
Returns the integer id in the last (kind, id) pair, or None if the key
has an string id or is incomplete.
see I think the id of a key can be a int ; so I write
r = ndb.Key(UserSession, int(id)).get()
if r:
return r.session
but the dev_server.py , will always raise
File "/home/bitcoin/down/google_appengine/google/appengine/datastore/datastore_stub_util.py", line 346, in CheckReference
raise datastore_errors.BadRequestError('missing key id/name')
BadRequestError: missing key id/name
I chanage the int(id) -> str(id)
seems right ;
so my question is , How to use ndb key with integer_id ?
the model is
class UserSession(ndb.Model):
session = ndb.BlobProperty()
The type of the id you use when reading the entity must match the type of the id you used when you wrote the entity. Normally, integer ids are assigned automatically when you write a new entity without specifying an id or key; you then get the id out of the key returned by entity.put(). It is generally not recommended to assign your own integer ids; when the app assigns the keys, the convention is that they should be strings.
There's an easier way to fetch:
UserSession.get_by_id(int(id))
https://developers.google.com/appengine/docs/python/ndb/modelclass#Model_get_by_id
If that doesn't work, I suspect that id is wrong or empty.
There must be something wrong with your variable 'id'.
Your code here should be no problem, and it's better to user long instead of int.
You can try your code on interactive console of development server with specific integer id.
It may be easier to identify your entities in the sessions with their keys instead of their ids. There really is no need to extract the ID from the key to identify the session (other than maybe saving a bit of memory. I think the way your thinking is based on a RDB. I learned that using the key actually makes entity/session identifications easier.
'id' is also a python builtin function. Maybe you are taking that by mistake.

get_by_id() will not return model instance

I have a Model called Version that looks like this:
from google.appengine.ext import db
import piece
class Version(db.Model):
"A particular version of a piece of writing."
parent_piece = db.ReferenceProperty(piece.Piece, collection_name='versions')
"The Piece to which this version belongs."
note = db.TextProperty()
"A note from the Author about this version."
content = db.TextProperty()
"The actual content of this version of the Piece."
published_on = db.DateProperty(auto_now_add=True)
"The date on which the version was published."
I would like to access instances of Version via their IDs, using Version.get_by_id(), but this call always returns None. I can see in the Datastore Viewer that they have ID values, and in the debugger, I can query for them but not use them:
>>> for each_ver in version.Version.all():
... print each_ver.key().id()
...
34
35
36
31
32
>>> a = version.Version.get_by_id(34)
>>> type(a)
<type 'NoneType'>
I see that there are plenty of questions here where people are able to use get_by_id() effectively just as I wish, and they do not see the results that I am seeing.
Could the problem be that each Version instance is a child in an Entity Group rather than a root of an Entity Group? Each Version lives in an Entity Group that looks like Member->Piece->Version. If that is the problem, is there a way that I can refer to Version entity without using its entire key? If that is not the problem, can anyone tell me what I can do to make get_by_id() work as expected?
Could the problem be that each Version
instance is a child in an Entity Group
rather than a root of an Entity Group?
Yes. An entity's key includes the keys of any parent entities.
If that is the problem, is there a
way that I can refer to Version entity
without using its entire key?
No. An entity is uniquely identified only by its entire key, which includes the keys of all the parent entities. If you know the kinds of its parent entities, though, you can use db.Key.from_path to construct the key from the chain of IDs or key names.
I had your same problem but in ndb.Model and I found that I need to convert the ID to an int. So maybe using version.Version.get_by_id(int(34)) can solve your problem.

Resources