what are the rules that apply on the key_name in app engine? - google-app-engine

I'm trying to use an app engine User object's user_id (returned by the User.user_id() method) as a key_name in my own User class. The problem is that it keeps telling me that it's an invalid key_name. I've tried sha2'ing it, and using the digest() as well as the hexdigest() method to reduce the number of possible characters, but still no good result. Is this because the value is too long, or because key names can't have certain characters? And also, how can I modify a user_id in such a way that it stays unique, but is also usable as a key_name for an entity? Extra bonus if it uses a hash so that thje user_id can't be guessed.
Here is the code where the error occured:
def get_current_user():
return User.get(db.Key(hashlib.sha1(users.get_current_user().user_id()).hexdigest()))
I'm now doing some more testing, concidering suggestions from the comments and answer.

I'm not sure why it isn't working for you, the following has no issues when I run it in the dev console.
from google.appengine.ext import db
from google.appengine.api import users
user = users.get_current_user()
name = user.user_id()
print db.Key.from_path ('User', name)
However if you are hashing it (which it sounds like you may be), be aware that you may get a collision. I would avoid against using a hash and would consider some other means of anonymization if you are giving the key to clients. Such as another model whose key you can give away, that has the user's key stored in it. Another method would be to encrypt the id (using the same key for all users) rather than hash it.
If you are doing something that generates binary data (encryption / hash digest) app engine (the sdk at-least) has issues, so you need to encode it first, and use that as the key_name.
name = user.user_id()
hashed_name = hashlib.sha1(name).digest()
encoded_name = base64.b64encode (name)
db.Key.from_path ('User', encoded_name)

Related

How to insert/get entities from the google cloud Data Store (python)

I am trying to make a python program that uses the google cloud data store in python and i am having some trouble with the DataStore input\output system. this is my first time working with google cloud, and i am also somewhat new to python.
I am trying to build a very simple database, with only 1 type of entity model- 'Command', which has 2 variables- 'name' which i want to be the key, and 'value'. all the entities will have one parent, because the google cloud guide said this will put all the entities in the same entity group, which helps sort them? (i am not sure about this, so explanation will be nice)
class Command(ndb.Model):
value = ndb.IntegerProperty()
# no 'name' variable, since it's the key.
def parent_key():
return ndb.Key(Command, DEFAULT_PARENT_NAME)
when the user uses a 'set' command, the code will either insert a new entity with the given name and value, or if the name exists already, it will change the existing value to the given value.
(assume 'variable_name' is the name and 'variable_value' is the value)
this is the code for the 'set' command:
variable_name = self.request.get('name')
variable_value = self.request.get('value')
newcommand = Command(id=variable_name, value=int(variable_value), parent=parent_key()) # create a new command model
newcommand.put()
this inserts a new command, but doesn't check if it is already in the datastore.
i want the 'get' command to extract the value of an existing name in the database (or return an error, if it doesn't exist), given the name (in a string)
in the online manual i found how to extract stuff from the database given a key, but here i dont have a key, i have a string.
I don't know how to complete the 'set' and 'get' commands and would appreciate some help with this.
Thanks in advance!

How to use ndb key with integer_id?

I see the document
https://developers.google.com/appengine/docs/python/ndb/keyclass#Key_integer_id
Returns the integer id in the last (kind, id) pair, or None if the key
has an string id or is incomplete.
see I think the id of a key can be a int ; so I write
r = ndb.Key(UserSession, int(id)).get()
if r:
return r.session
but the dev_server.py , will always raise
File "/home/bitcoin/down/google_appengine/google/appengine/datastore/datastore_stub_util.py", line 346, in CheckReference
raise datastore_errors.BadRequestError('missing key id/name')
BadRequestError: missing key id/name
I chanage the int(id) -> str(id)
seems right ;
so my question is , How to use ndb key with integer_id ?
the model is
class UserSession(ndb.Model):
session = ndb.BlobProperty()
The type of the id you use when reading the entity must match the type of the id you used when you wrote the entity. Normally, integer ids are assigned automatically when you write a new entity without specifying an id or key; you then get the id out of the key returned by entity.put(). It is generally not recommended to assign your own integer ids; when the app assigns the keys, the convention is that they should be strings.
There's an easier way to fetch:
UserSession.get_by_id(int(id))
https://developers.google.com/appengine/docs/python/ndb/modelclass#Model_get_by_id
If that doesn't work, I suspect that id is wrong or empty.
There must be something wrong with your variable 'id'.
Your code here should be no problem, and it's better to user long instead of int.
You can try your code on interactive console of development server with specific integer id.
It may be easier to identify your entities in the sessions with their keys instead of their ids. There really is no need to extract the ID from the key to identify the session (other than maybe saving a bit of memory. I think the way your thinking is based on a RDB. I learned that using the key actually makes entity/session identifications easier.
'id' is also a python builtin function. Maybe you are taking that by mistake.

ndb retrieving entity key by ID without parent

I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.
I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.
Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.
I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()
My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item

Django profile id may not not null using get_or_create, how does it relate to the db?

I followed a bit the steps on Django User Profiles - Simple yet powerful.
Not quite the same because I am in the middle of developing the idea.
From this site I used in particular, also this line:
User.profile = property(lambda u:
UserProfile.objects.get_or_create(user=u)[0])
I was getting always an error message on creating the object, typically
"XX" may not be null. I solved part of the problems by playing with models
and (in my present case) sqliteman. Till I got the same
message on the id: "xxx.id may not be null".
On the net I found a description of a possible solution which involved doing a reset
of the database, which I was not that happy to do. In particular because for the
different solutions, it might have involved the reset of the application db.
But because the UserProfile model was kinda new and till now empty,
I played with it on the DB directly and made an hand made drop of the table and
ask syncdb to rebuilt it. (kinda risky thought).
Now this is the diff of the sqlite dump:
294,298c290,294
< CREATE TABLE "myt_userdata" (
< "id" integer NOT NULL PRIMARY KEY,
< "user_id" integer NOT NULL UNIQUE REFERENCES "auth_user" ("id"),
< "url" varchar(200),
< "birthday" datetime
---
> CREATE TABLE myt_userdata (
> "id" INTEGER NOT NULL,
> "user_id" INTEGER NOT NULL,
> "url" VARCHAR(200),
> "birthday" DATETIME
Please note that both versions are generated by django. The ">" version was generated with a simple model definition which had indeed the connection with the user table via:
user = models.ForeignKey(User, unique=True)
The new "<" version has much more information and it is working.
My question:
Why Django complains about an myt_userdata.id may not be null?
The subsidiary question:
Does Django tries to relate to the underline db structure, how?
(for example the not NULL message comes from the model or from the DB?)
The additional question:
I have been a bit reluctant to the use south: Too complicated, additional modules
which I might have to care between devel and production and maybe not that easy
if I want to switch DB engine (I am using sqlite only at devel stage, I plan to move to
mysql).
Probably south might have worked in this case. Would it work? would you suggest its use
anyway?
Edited FIY:
This is my last model (the working one):
class UserData(models.Model):
user = models.ForeignKey(User, unique=True)
url = models.URLField("Website", blank=True, null=True)
birthday = models.DateTimeField('Birthday', blank=True, null=True)
def __unicode__(self):
return self.user.username
User.profile = property(lambda u: UserData.objects.get_or_create(user=u,defaults={'birthday': '1970-01-01 00:00:00'})[0])
Why Django complains about an myt_userdata.id may not be null?
Because id is not a primary key and is not populated automatically though. Also, you don't provide it on model creation, so DB does not know what to do.
Does Django tries to relate to the underline db structure, how? (for example the not NULL message comes from the model or from the DB?)
It's an error from DB, not from Django.
You can use sql command to understan what exactly is executed on syncdb. Variant above seems to be correct table definition made from correct Django model, and I have no ide how have you got a variant below. Write a correct and clear model, and you'll get correct and working table scheme after syncdb

How do Django queries 'cast' argument strings into the appropriate field-matching types?

Let's take the Django tutorial. In the first part we can find this model:
class Poll(models.Model):
question = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published')
with which Django generates the following SQL:
CREATE TABLE "polls_poll" (
"id" serial NOT NULL PRIMARY KEY,
"question" varchar(200) NOT NULL,
"pub_date" timestamp with time zone NOT NULL
);
One can note that Django automatically added an AutoField, gloriously named id, which is akin to an IntegerField in that it handles integers.
On part 3, we build a custom view, reachable through the following url pattern:
(r'^polls/(?P<poll_id>\d+)/$', 'polls.views.detail'),
The tutorial helpfully explains that a subsequent HTTP request will result in the following call:
detail(request=<HttpRequest object>, poll_id='23')
A few scrolls later, we can find this snippet:
def detail(request, poll_id):
try:
p = Poll.objects.get(pk=poll_id)
Notice how the URL tail component becomes the poll_id argument with a string value of '23', happily churned by the Manager (and therefore QuerySet) get method to produce the result of an SQL query containing a WHERE clause with an integer value of 23 certainly looking like that one:
SELECT * FROM polls_poll WHERE id=23
Certainly Django performed the conversion from the fact that the id field is an AutoField one. The question is how, and when. Specifically, I want to know which internal methods are called, and in what order (kind of like what the doc explains for form validation).
Note: I took a look at sources in django.db.models and found a few *prep* methods, but don't know neither when or where they are called, let alone if they're what I'm looking for.
PS: I know it's not casting stricto sensu, but I think you get the idea.
I think it's in django.db.models.query.get_where_clause

Resources