can't update datastore in google-apps-engine - google-app-engine

I created a datastore object similarly to the guestbook tutorial:
class myDS(ndb.Model):
a = ndb.StringProperty(indexed=True)
And I have an Handlers to access it and update is:
class Handler1:
my_ds = myDS()
my_ds.a = "abc" #Trying to update the value
class Handler2:
my_ds = myDS()
self.response.write(my_ds.a) #Trying to access my_ds.aafter it was updated in Handlers1
def main():
application = webapp.WSGIApplication([
('/set', Handler1),
('/get', Handler2])
I call:
Myapp.com/set
Myapp.com/get : Prints None (Didn't update to "abc")
Why wasn't the value of a updated?
How can I update across the handlers?

The first thing to note is that in Handler1 your code doesn't actually save the entity to the datastore, for that you need to invoke the .put() method:
my_ds = myDS()
my_ds.a = "abc" #Trying to update the value
my_ds_key = my_ds.put() # save my_ds to the datastore and get its key
The second thing to note is that in Handler2 the my_ds = myDS() call doesn't retrieve an entity from the datastore as you might expect, it just creates a new entity instead (which is also not saved to the datastore). To retrieve an entity from the datastore you need to do a lookup by the entity's key (or obtain it via a query):
my_ds = my_ds_key.get()
These are very basic concepts about using the datastore, you probably need to get more familiar with them. You should go through Creating, Retrieving, Updating, and Deleting Entities (and maybe other related chapters grouped under the Google Cloud Datastore section in the left side navigation bar in that doc page)
Finally, to be able to make such lookup you need to somehow be able to determine or transfer the entity key obtained in Handler1 to Handler2 as each request to these handlers is independent from each-other. Possibly of interest: Passing data between pages in a redirect() function in Google App Engine
Example of passing the key's string representation via webapp2 sessions:
In Handler1
my_ds_key = my_ds.put() # save my_ds to the datastore and get its key
self.session['urlsafe'] = my_ds_key.urlsafe()
In Handler2
urlsafe = self.session.get('urlsafe')
if urlsafe:
my_ds = ndb.Key(urlsafe=urlsafe).get()
Example of passing the key's string representation using a URL query string like /get?urlsafe=<key's urlsafe representation> (maybe hashed if you want, as it will be visible in the browser):
In Handler1
my_ds_key = my_ds.put() # save my_ds to the datastore and get its key
self.redirect('/get?urlsafe=%s' % my_ds_key.urlsafe())
In Handler2
urlsafe = self.request.get('urlsafe')
if urlsafe:
my_ds = ndb.Key(urlsafe=urlsafe).get()
Example of getting the entity in Handler2 via a query (the example assumes the right entity is returned by the query)
results = myDS().query().fetch(limit=1)
if results:
my_ds = results[0]

Related

Cakephp 3 - How to integrate external sources in table?

I working on an application that has its own database and gets user information from another serivce (an LDAP is this case, through an API package).
Say I have a tables called Articles, with a column user_id. There is no Users table, instead a user or set of users is retrieved through the external API:
$user = LDAPConnector::getUser($user_id);
$users = LDAPConnector::getUsers([1, 2, 5, 6]);
Of course I want retrieving data from inside a controller to be as simple as possible, ideally still with something like:
$articles = $this->Articles->find()->contain('Users');
foreach ($articles as $article) {
echo $article->user->getFullname();
}
I'm not sure how to approach this.
Where should I place the code in the table object to allow integration with the external API?
And as a bonus question: How to minimise the number of LDAP queries when filling the Entities?
i.e. it seems to be a lot faster by first retrieving the relevant users with a single ->getUsers() and placing them later, even though iterating over the articles and using multiple ->getUser() might be simpler.
The most simple solution would be to use a result formatter to fetch and inject the external data.
The more sophisticated solution would a custom association, and a custom association loader, but given how database-centric associations are, you'd probably also have to come up with a table and possibly a query implementation that handles your LDAP datasource. While it would be rather simple to move this into a custom association, containing the association will look up a matching table, cause the schema to be inspected, etc.
So I'll stick with providing an example for the first option. A result formatter would be pretty simple, something like this:
$this->Articles
->find()
->formatResults(function (\Cake\Collection\CollectionInterface $results) {
$userIds = array_unique($results->extract('user_id')->toArray());
$users = LDAPConnector::getUsers($userIds);
$usersMap = collection($users)->indexBy('id')->toArray();
return $results
->map(function ($article) use ($usersMap) {
if (isset($usersMap[$article['user_id']])) {
$article['user'] = $usersMap[$article['user_id']];
}
return $article;
});
});
The example makes the assumption that the data returned from LDAPConnector::getUsers() is a collection of associative arrays, with an id key that matches the user id. You'd have to adapt this accordingly, depending on what exactly LDAPConnector::getUsers() returns.
That aside, the example should be rather self-explanatory, first obtain a unique list of users IDs found in the queried articles, obtain the LDAP users using those IDs, then inject the users into the articles.
If you wanted to have entities in your results, then create entities from the user data, for example like this:
$userData = $usersMap[$article['user_id']];
$article['user'] = new \App\Model\Entity\User($userData);
For better reusability, put the formatter in a custom finder. In your ArticlesTable class:
public function findWithUsers(\Cake\ORM\Query $query, array $options)
{
return $query->formatResults(/* ... */);
}
Then you can just do $this->Articles->find('withUsers'), just as simple as containing.
See also
Cookbook > Database Access & ORM > Query Builder > Adding Calculated Fields
Cookbook > Database Access & ORM > Retrieving Data & Results Sets > Custom Finder Methods

How to get Entry Id for a Newly Created Entity in Breeze

i want to create a registration form that will be in batch with a continuation button, getting the id of the entry will help me to call the save method.
I want to immediately get the primary key of a new Entry Created using BreezeJS, Pls i need help on this.
Thanks
Not entirely sure I understand your question, but it sounds like you want to get the id of a newly saved record immediately after the save. If so then the answer below applies.
When the save promise resolves it returns both the list of saved entities as well as a keyMappings array for any entities whose ids changed as a result of the save. i.e. a mapping from temporary to real ids. i.e. (Documented here: http://www.breezejs.com/sites/all/apidocs/classes/EntityManager.html#method_saveChanges)
myEntityManager.saveChanges().then(function (saveResult) {
// entities is an array of entities that were just saved.
var entitites = saveResult.entities;
var keyMappings = saveResult.keyMappings;
keyMappings.forEach(function(km) {
var tempId = km.tempValue;
var newId = km.realValue;
});
});
On the other hand if you have an entity and you just want its 'key' you can use the EntityAspect.getKey method. (see http://www.breezejs.com/sites/all/apidocs/classes/EntityAspect.html#method_getKey)
// assume order is an order entity attached to an EntityManager.
var entityKey = order.entityAspect.getKey();

SQLAlchemy is deletion on before_commit safe?

I am using SQLAlchemy and try to manage a model "Media" which has a many-to-one relationship with a "Booking". Is it safe to call scoped_session.delete() from within a before_commit event?
def before_commit(session):
r""" Invokes the ``before_commit`` method on all items in the session.
This allows the models to perform an update-action depending on their
new data. """
for item in session.deleted:
if hasattr(item, 'before_commit'):
item.before_commit(session, 'deleted')
for item in session.dirty:
if hasattr(item, 'before_commit'):
item.before_commit(session, 'dirty')
for item in session.new:
if hasattr(item, 'before_commit'):
item.before_commit(session, 'new')
event.listen(db.session.__class__, 'before_commit', before_commit)
class Booking(db.Model):
# ...
media = db.relationship(Media, backref='booking')
def before_commit(self, session, status):
r""" Validates the booking's data. If the booking is being deleted,
all its media will be deleted with it. """
if status == 'deleted':
# Delete all the media that is associated with this booking.
for media in self.media:
session.delete(media)
a mass delete() using either session.execute("delete...") or session.query(cls).delete() should be fine, it just emits that SQL on the current connection.
As far as session.delete(obj), it looks like before_commit() is invoked before the final flush(), so in that sense you can treat it like a before_flush() event. Try it out and you should see the DELETE being emitted, and if so then you're fine.

Is it possible to set two fields as indexes on an entity in ndb?

I am new to ndb and gae and have a problem coming up with a good solution setting indexes.
Let say we have a user model like this:
class User(ndb.Model):
name = ndb.StringProperty()
email = ndb.StringProperty(required = True)
fb_id = ndb.StringProperty()
Upon login if I was going to check against the email address with a query, I believe this would be quite slow and inefficient. Possibly it has to do a full table scan.
q = User.query(User.email == EMAIL)
user = q.fetch(1)
I believe it would be much faster, if User models were saved with the email as their key.
user = user(id=EMAIL)
user.put()
That way I could retrieve them like this a lot faster (so I believe)
key = ndb.Key('User', EMAIL)
user = key.get()
So far if I am wrong please correct me. But after implementing this I realized there is a chance that facebook users would change their email address, that way upon a new oauth2.0 connection their new email can't be recognized in the system and they will be created as a new user. Hence maybe I should use a different approach:
Using the social-media-provider-id (unique for all provider users)
and
provider-name (in rare case that two twitter and facebook users share
the same provider-id)
However in order to achieve this, I needed to set two indexes, which I believe is not possible.
So what could I do? Shall I concatenate both fields as a single key and index on that?
e.g. the new idea would be:
class User(ndb.Model):
name = ndb.StringProperty()
email = ndb.StringProperty(required = True)
provider_id = ndb.StringProperty()
provider_type = ndb.StringProperty()
saving:
provider_id = 1234
provider_type = fb
user = user(id=provider_id + provider_type)
user.put()
retrieval:
provider_id = 1234
provider_type = fb
key = ndb.Key('User', provider_id + provider_type)
user = key.get()
This way we don't care any more if the user changes the email address on his social media.
Is this idea sound?
Thanks,
UPDATE
Tim's solution sounded so far the cleanest and likely also the fastest to me. But I came across a problem.
class AuthProvider(polymodel.PolyModel):
user_key = ndb.KeyProperty(kind=User)
active = ndb.BooleanProperty(default=True)
date_created = ndb.DateTimeProperty(auto_now_add=True)
#property
def user(self):
return self.user_key.get()
class FacebookLogin(AuthProvider):
pass
View.py: Within facebook_callback method
provider = ndb.Key('FacebookLogin', fb_id).get()
# Problem is right here. provider is always None. Only if I used the PolyModel like this:
# ndb.Key('AuthProvider', fb_id).get()
#But this defeats the whole purpose of having different sub classes as different providers.
#Maybe I am using the key handeling wrong?
if provider:
user = provider.user
else:
provider = FacebookLogin(id=fb_id)
if not user:
user = User()
user_key = user.put()
provider.user_key = user_key
provider.put()
return user
One slight variation on your approach which could allow a more flexible model will be to create a separate entity for the provider_id, provider_type, as the key or any other auth scheme you come up with
This entity then holds a reference (key) of the actual user details.
You can then
do a direct get() for the auth details, then get() the actual user details.
The auth details can be changed without actually rewriting/rekeying the user details
You can support multiple auth schemes for a single user.
I use this approach for an application that has > 2000 users, most use a custom auth scheme (app specific userid/passwd) or google account.
e.g
class AuthLogin(ndb.Polymodel):
user_key = ndb.KeyProperty(kind=User)
status = ndb.StringProperty() # maybe you need to disable a particular login with out deleting it.
date_created = ndb.DatetimeProperty(auto_now_add=True)
#property
def user(self):
return self.user_key.get()
class FacebookLogin(AuthLogin):
# some additional facebook properties
class TwitterLogin(AuthLogin):
# Some additional twitter specific properties
etc...
By using PolyModel as the base class you can do a AuthLogin.query().filter(AuthLogin.user_key == user.key) and get all auth types defined for that user as they all share the same base class AuthLogin. You need this otherwise you would have to query in turn for each supported auth type, as you can not do a kindless query without an ancestor, and in this case we can't use the User as the ancestor becuase then we couldn't do a simple get() to from the login id.
However some things to note, all subclasses of AuthLogin will share the same kind in the key "AuthLogin" so you still need to concatenate the auth_provider and auth_type for the keys id so that you can ensure you have unique keys. E.g.
dev~fish-and-lily> from google.appengine.ext.ndb.polymodel import PolyModel
dev~fish-and-lily> class X(PolyModel):
... pass
...
dev~fish-and-lily> class Y(X):
... pass
...
dev~fish-and-lily> class Z(X):
... pass
...
dev~fish-and-lily> y = Y(id="abc")
dev~fish-and-lily> y.put()
Key('X', 'abc')
dev~fish-and-lily> z = Z(id="abc")
dev~fish-and-lily> z.put()
Key('X', 'abc')
dev~fish-and-lily> y.key.get()
Z(key=Key('X', 'abc'), class_=[u'X', u'Z'])
dev~fish-and-lily> z.key.get()
Z(key=Key('X', 'abc'), class_=[u'X', u'Z'])
This is the problem you ran into. By adding the provider type as part of the key you now get distinct keys.
dev~fish-and-lily> z = Z(id="Zabc")
dev~fish-and-lily> z.put()
Key('X', 'Zabc')
dev~fish-and-lily> y = Y(id="Yabc")
dev~fish-and-lily> y.put()
Key('X', 'Yabc')
dev~fish-and-lily> y.key.get()
Y(key=Key('X', 'Yabc'), class_=[u'X', u'Y'])
dev~fish-and-lily> z.key.get()
Z(key=Key('X', 'Zabc'), class_=[u'X', u'Z'])
dev~fish-and-lily>
I don't believe this is any less convenient a model for you.
Does all that make sense ;-)
While #Greg's answer seems OK, I think it's actually a bad idea to associate an external type/id as a key for your entity, because this solution doesn't scale very well.
What if you would like to implement your own username/password at one point?
What if the user going to delete their Facebook account?
What if the same user wants to sign in with a Twitter account as well?
What if the user has more than one Facebook accounts?
So the idea of having the type/id as key looks weak. A better solution would be to have a field for every type to store only the id. For example facebook_id, twitter_id, google_id etc, then query on these fields to retrieve the actual user. This will happen during sign-in and signup process so it's not that often. Of course you will have to add some logic to add another provider for an already existed user or merge users if the same user signed in with a different provider.
Still the last solution won't work if you want to support multiple sign-ins from the same provider. In order to achieve that you will have to create another model that will store only the external providers/ids and associate them with your user model.
As an example of the second solution you could check my gae-init project where I'm storing the 3 different providers in the User model and working on them in the auth.py module. Again this solution doesn't not scale very well with more providers and doesn't support multiple IDs from the same provider.
Concatenating the user-type with their ID is sensible.
You can save on your read and write costs by not duplicating the type and ID as properties though - when you need to use them, just split the ID back up. (Doing this will be simpler if you include a separator between the parts, '%s|%s' % (provider_type, provider_id) for example)
If you want to use a single model, you can do something like:
class User(ndb.Model):
name = ndb.StringProperty()
email = ndb.StringProperty(required = True)
providers = ndb.KeyProperty(repeated=True)
auser = User(id="auser", name="A user", email="auser#example.com")
auser.providers = [
ndb.Key("ProviderName", "fb", "ProviderId", 123),
ndb.Key("ProviderName", "tw", "ProviderId", 123)
]
auser.put()
To query for a specific FB login, you simple do:
fbkey = ndb.Key("ProviderName", "fb", "ProviderId", 123)
for entry in User.query(User.providers==fbkey):
# Do something with the entry
As ndb does not provide an easy way to create a unique constraint, you could use the _pre_put_hook to ensure that providers is unique.

How can I mimic 'select_related' using google-appengine and django-nonrel?

django nonrel's documentation states: "you have to manually write code for merging the results of multiple queries (JOINs, select_related(), etc.)".
Can someone point me to any snippets that manually add the related data? #nickjohnson has an excellent post showing how to do this with the straight AppEngine models, but I'm using django-nonrel.
For my particular use I'm trying to get the UserProfiles with their related User models. This should be just two simple queries, then match the data.
However, using django-nonrel, a new query gets fired off for each result in the queryset. How can I get access to the related items in a 'select_related' sort of way?
I've tried this, but it doesn't seem to work as I'd expect. Looking at the rpc stats, it still seems to be firing a query for each item displayed.
all_profiles = UserProfile.objects.all()
user_pks = set()
for profile in all_profiles:
user_pks.add(profile.user_id) # a way to access the pk without triggering the query
users = User.objects.filter(pk__in=user_pks)
for profile in all_profiles:
profile.user = get_matching_model(profile.user_id, users)
def get_matching_model(key, queryset):
"""Generator expression to get the next match for a given key"""
try:
return (model for model in queryset if model.pk == key).next()
except StopIteration:
return None
UPDATE:
Ick... I figured out what my issue was.
I was trying to improve the efficiency of the changelist_view in the django admin. It seemed that the select_related logic above was still producing additional queries for each row in the results set when a foreign key was in my 'display_list'. However, I traced it down to something different. The above logic does not produce multiple queries (but if you more closely mimic Nick Johnson's way it will look a lot prettier).
The issue is that in django.contrib.admin.views.main on line 117 inside the ChangeList method there is the following code: result_list = self.query_set._clone(). So, even though I was properly overriding the queryset in the admin and selecting the related stuff, this method was triggering a clone of the queryset which does NOT keep the attributes on the model that I had added for my 'select related', resulting in an even more inefficient page load than when I started.
Not sure what to do about it yet, but the code that selects related stuff is just fine.
I don't like answering my own question, but the answer might help others.
Here is my solution that will get related items on a queryset based entirely on Nick Johnson's solution linked above.
from collections import defaultdict
def get_with_related(queryset, *attrs):
"""
Adds related attributes to a queryset in a more efficient way
than simply triggering the new query on access at runtime.
attrs must be valid either foreign keys or one to one fields on the queryset model
"""
# Makes a list of the entity and related attribute to grab for all possibilities
fields = [(model, attr) for model in queryset for attr in attrs]
# we'll need to make one query for each related attribute because
# I don't know how to get everything at once. So, we make a list
# of the attribute to fetch and pks to fetch.
ref_keys = defaultdict(list)
for model, attr in fields:
ref_keys[attr].append(get_value_for_datastore(model, attr))
# now make the actual queries for each attribute and store the results
# in a dict of {pk: model} for easy matching later
ref_models = {}
for attr, pk_vals in ref_keys.items():
related_queryset = queryset.model._meta.get_field(attr).rel.to.objects.filter(pk__in=set(pk_vals))
ref_models[attr] = dict((x.pk, x) for x in related_queryset)
# Finally put related items on their models
for model, attr in fields:
setattr(model, attr, ref_models[attr].get(get_value_for_datastore(model, attr)))
return queryset
def get_value_for_datastore(model, attr):
"""
Django's foreign key fields all have attributes 'field_id' where
you can access the pk of the related field without grabbing the
actual value.
"""
return getattr(model, attr + '_id')
To be able to modify the queryset on the admin to make use of the select related we have to jump through a couple hoops. Here is what I've done. The only thing changed on the 'get_results' method of the 'AppEngineRelatedChangeList' is that I removed the self.query_set._clone() and just used self.query_set instead.
class UserProfileAdmin(admin.ModelAdmin):
list_display = ('username', 'user', 'paid')
select_related_fields = ['user']
def get_changelist(self, request, **kwargs):
return AppEngineRelatedChangeList
class AppEngineRelatedChangeList(ChangeList):
def get_query_set(self):
qs = super(AppEngineRelatedChangeList, self).get_query_set()
related_fields = getattr(self.model_admin, 'select_related_fields', [])
return get_with_related(qs, *related_fields)
def get_results(self, request):
paginator = self.model_admin.get_paginator(request, self.query_set, self.list_per_page)
# Get the number of objects, with admin filters applied.
result_count = paginator.count
# Get the total number of objects, with no admin filters applied.
# Perform a slight optimization: Check to see whether any filters were
# given. If not, use paginator.hits to calculate the number of objects,
# because we've already done paginator.hits and the value is cached.
if not self.query_set.query.where:
full_result_count = result_count
else:
full_result_count = self.root_query_set.count()
can_show_all = result_count self.list_per_page
# Get the list of objects to display on this page.
if (self.show_all and can_show_all) or not multi_page:
result_list = self.query_set
else:
try:
result_list = paginator.page(self.page_num+1).object_list
except InvalidPage:
raise IncorrectLookupParameters
self.result_count = result_count
self.full_result_count = full_result_count
self.result_list = result_list
self.can_show_all = can_show_all
self.multi_page = multi_page
self.paginator = paginator

Resources