I was wondering if somebody could help. I'm using the blobcache module outlined in this post here
This works fine but I'm looking to speed retrieval from memcache by using the get_multi()
key function but my current code cannot find the keys when using get_multi
My current get def looks like this
def get(key):
chunk_keys = memcache.get(key)
if chunk_keys is None:
return None
chunk_keys= ",".join(chunk_keys)
str(chunk_keys)
chunk = memcache.get_multi(chunk_keys)
if chunk is None:
return None
try:
return chunk
except Exception:
return None
My understanding per the documentation is that you only need to pass through a string of keys to get_multi.
However his is not returning anything at the moment.
Can someone point out what i'm doing wrong here?
pass it a list of strings (keys) , instead of a single string with commas in it.
get_multi(keys, key_prefix='', namespace=None, for_cas=False)
keys = List of keys to look up. A Key can be a string or a tuple of
(hash_value, string), where the hash_value, normally used for sharding
onto a memcache instance, is instead ignored, as Google App Engine
deals with the sharding transparently.
Multi Get Documentation
Related
This is ridicolously trivial but i've spent half an hour trying to solve it.
class SocialPost(model.Model):
total_comments=model.IntegerProperty(default=0)
def create_reply_comment(self,content,author):
...
logging.info(self)
self.total_comments=self.total_comments+1
self.put()
In the logfile, i can see how total_comments is 0 but in the admin console, it is 1. The other fields are correct, except for this one.
Probably there's something wrong in that "default=0" but i can't find what is wrong.
Edit: full code of my function
def create_reply_comment(self,content,author):
floodControl=memcache.get("FloodControl-"+str(author.key))
if floodControl:
raise base.FloodControlException
new_comment= SocialComment(parent=self.key)
new_comment.author=author.key
new_comment.content=content
new_comment.put()
logging.info(self)
self.latest_comment_date=new_comment.creation_date
self.latest_comment=new_comment.key
self.total_comments=self.total_comments+1
self.put()
memcache.add("FloodControl-"+str(author.key), datetime.now(),time=SOCIAL_FLOOD_TIME)
Where i call the function:
if cmd == "create_reply_post":
post=memcache.get("SocialPost-"+str(self.request.get('post')))
if post is None:
post=model.Key(urlsafe=self.request.get('post')).get()
memcache.add("SocialPost-"+str(self.request.get('post')),post)
node=node.get()
if not node.get_subscription(user).can_reply:
self.success()
return
post.create_reply_comment(feedparser._sanitizeHTML(self.request.get("content"),"UTF-8"),user)
You're calling memcache.add before you make your change to total_comments, so when you read it back from memcache on subsequent calls, you're getting an out-of-date value from the cache. Your create_reply_comment needs to either delete or overwrite the "SocialPost-"+str(self.request.get('post') cache key.
[edit] Though your post title says you're using NDB (model.Model though? Hmm.), so you could just skip the memcache bits entirely, and let NDB do it's thing?
I'm reading the documentation on full text search api (java) in google app engine at https://developers.google.com/appengine/docs/java/search/overview. They have example on getting the index:
public Index getIndex() {
IndexSpec indexSpec = IndexSpec.newBuilder()
.setName("myindex")
.setConsistency(Consistency.PER_DOCUMENT)
.build();
return SearchServiceFactory.getSearchService().getIndex(indexSpec);
}
How about on creating an index? How to create one?
Thanks
You just did. You just created one.
public class IndexSpec
Represents information about an index. This class is used to fully specify the index you want to retrieve from the SearchService. To build an instance use the newBuilder() method and set all required parameters, plus optional values different than the defaults.
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/search/IndexSpec
You can confirm this by looking at the SearchService
SearchService is also responsible for creating new indexes. For example:
SearchService searchService = SearchServiceFactory.getSearchService();
index = searchService.getIndex(IndexSpec.newBuilder().setName("myindex"));
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/search/SearchService
Anyway, It seems your code will create a new index if it doesn't exist. That's what the docs suggest:
// Get the index. If not yet created, create it.
Index index = searchService.getIndex(
IndexSpec.newBuilder()
.setIndexName("indexName")
.setConsistency(Consistency.PER_DOCUMENT));
https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/search/Index
Now, what happens if you run the code again and change the Consistency? Do you have the same index with a different consistency? Is the index overwritten? I don't know. I would use the SearchService to lookup existing indexes instead of using code that might create them just to avoid trying to get an index in my code but changing the specs inadvertantly.
An Index is implicitly created when a document is written. Consistency is an attribute of the index, i.e. you can't have two indexes of the same name with different consistencies.
in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of 'maximum recursion depth exceeded while calling a Python objec' but i can't really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
#property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
#property
def another_property(self):
# compute something here
return computed_property
EDIT2:
#ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?
Since most of the execution time of your functions are spent waiting for RPCs, NDB's async and tasklet support is your best bet. That's described in some detail here. The simplest usage for your requirements is probably to use the ndb.map function, like this (from the docs):
#ndb.tasklet
def callback(msg):
acct = yield ndb.get_async(msg.author)
raise tasklet.Return('On %s, %s wrote:\n%s' % (msg.when, acct.nick(), msg.body))
qry = Messages.query().order(-Message.when)
outputs = qry.map(callback, limit=20)
for output in outputs:
print output
The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using _async methods and yield to do them asynchronously), returning the result when it's done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.
The pipeline API is overkill for what you want to do. Is there any reason why you couldn't just use a taskqueue?
Use the initial request to get all of the entity keys, and then enqueue a task for each key having the task execute the 2 functions per-entity. The concurrency will be based then on the number of concurrent requests as configured for that taskqueue.
I've got this code (Java, GAE):
// Much earlier:
playerKey = KeyFactory.keyToString(somePlayer.key);
// Then, later...
PersistenceManager pm = assassin.PMF.get().getPersistenceManager();
Key targetKey = KeyFactory.stringToKey(playerKey);
Query query = pm.newQuery(Player.class);
query.setFilter("__key__ == keyParam");
query.declareParameters("com.google.appengine.api.datastore.Key keyParam");
List<Player> players = (List<Player>) query.execute(targetKey); // <-- line 200
which generates this error:
javax.jdo.JDOFatalUserException: Unexpected expression type while parsing query. Are you certain that a field named __key__ exists on your object?
at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:354)
at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:252)
at myapp.Player.validPlayerWithKey(Player.java:200)
// [etc., snip]
But I'm not sure what it wants. I'm trying to search on the JDO id field, which I I thought I read had the special name __key__, in the documentation.
I've tried it with both
query.setFilter("__key__ == keyParam");
and
query.setFilter("ID == keyParam");
with the same results. So, what am I doing wrong? Or, more importantly, how do I do it correctly?
Thanks!
Edit: For completeness's sake, here is the final, working code (based on Gordon's answer, which I have accepted as correct):
Player result = null;
if (playerKey == null)
{
log.log(Level.WARNING, "Tried to find player with null key.");
}
else
{
PersistenceManager pm = assassin.PMF.get().getPersistenceManager();
try {
result = (Player) pm.getObjectById(Player.class, playerKey);
} catch (javax.jdo.JDOObjectNotFoundException notFound) {
// Player not found; we will return null.
result = null;
}
pm.close();
}
return result;
If your objective is to get an object by key, then you should use the PersistenceManager's getObjectByID() method. More details here.
As an aside, trying to construct a query to get something by it's key is something you shouldn't need to do. Although this is how you would work with an SQL database, the Google Data Store does things differently, and this is one of those cases where rather than go through the trouble of constructing a query, Google App Engine lets you get what you want directly. After all, you should only have one entity in the database with a particular key, so there's nothing in the rest of the machinery of a GQL query that you need in this case, hence it can all be skipped for efficiency.
I would recommend you to use the JPA ( http://code.google.com/appengine/docs/java/datastore/usingjpa.html ) to access your data in GAE, it has the very important advantage that you can use the widely known and documented JPA standard (and its JPAQL querying language) to do this kind of things, in portable way (if you stick to the JPA standard, your code will work for GAE, for Hibernate or with EclipseLink without modification)
On this question I solved the problem of querying Google Datastore to retrieve stuff by user (com.google.appengine.api.users.User) like this:
User user = userService.getCurrentUser();
String select_query = "select from " + Greeting.class.getName();
Query query = pm.newQuery(select_query);
query.setFilter("author == paramAuthor");
query.declareParameters("java.lang.String paramAuthor");
greetings = (List<Greeting>) query.execute(user);
The above works fine - but after a bit of messing around I realized this syntax in not very practical as the need to build more complicated queries arises - so I decided to manually build my filters and now I got for example something like the following (where the filter is usually passed in as a string variable but now is built inline for simplicity):
User user = userService.getCurrentUser();
String select_query = "select from " + Greeting.class.getName();
Query query = pm.newQuery(select_query);
query.setFilter("author == '"+ user.getEmail() +"'");
greetings = (List<Greeting>) query.execute();
Obviously this won't work even if this syntax with field = 'value' is supported by JDOQL and it works fine on other fields (String types and Enums). The other strange thing is that looking at the Data viewer in the app-engine dashboard the 'author' field is stored as type User but the value is 'user#gmail.com', and then again when I set it up as parameter (the case above that works fine) I am declaring the parameter as a String then passing down an instance of User (user) which gets serialized with a simple toString() (I guess).
Anyone any idea?
Using string substitution in query languages is always a bad idea. It's far too easy for a user to break out and mess with your environment, and it introduces a whole collection of encoding issues, etc.
What was wrong with your earlier parameter substitution approach? As far as I'm aware, it supports everything, and it sidesteps any parsing issues. As far as the problem with knowing how many arguments to pass goes, you can use Query.executeWithMap or Query.executeWithArray to execute a query with an unknown number of arguments.