GAE, memcache before DB update

GAE, memcache before DB update - google-app-engine

I have some troubles with memcache and GAE DB operations.
if i update memcache rigth after DB operations, x.put(), for example, my memcache function often return old value. If i use sleep(), cache more often correct, but this is not right, in my opinion
sleep(0.2)
data = Picture.all().order('-created').fetch(300)
memcache.set('pictures_all', data)
What i need to do, to get correct memcache?
ANSWER:
Need to use parent with query, all Picture entities must have same parent, then you get strong consistant results
data = Picture.all().order('-created').ancestor(main_key()).fetch(300)
memcache.set('pictures_all', data)

If you have the data, just update one entry in the memcache, no need to retrieve all from memcache. Something like
data.put()
memcache.set(key, data)

You're on the right track that the problem is with eventual consistency.
Using STRONG_CONSISTENCY does solve the problem, but it'll give you scalability problems down the road - ones that will be difficult to resolve.
The solution for this is, annoyingly, more complex than it should be. I'm also not sure whether there's really a bulletproof solution given the eventual consistency behavior.
pseudocode should look something like this:
all_pictures = memcache.get('pictures_all')
if not all_pictures:
all_pictures = convert_to_list(Picture.all().order('-created').fetch(300))
if not newdata in all_pictures:
add_to_list_in_proper_order(all_pictures, newdata)
memcache.set('pictures_all', all_pictures)

config = db.create_config(deadline=10, read_policy=db.STRONG_CONSISTENCY)
data = Picture.all().order('-created').fetch(300, config=config)
memcache.set('pictures_all', data)
I guess, this is solution.
EDIT: No, this is dont work

Great.
I had the same problem and the solution was exactly what asker gave: the use of ancestors
To read:
data = Picture.all().order('-created').ancestor(main_key()).fetch(300)
To save:
pic = Picture(parent=main_key(), ...)
pic.put()

Related

Objectify async: at what point RPC call is made?

Quite often I want to make two or more independent queries to fetch entities from Datastore. But I'm not sure if they are really parallel. For example:
loadResult1 = ofy().load().key(Key.create(Foo.class, 1));
loadResult2 = ofy().load().key(Key.create(Bar.class, 1));
loadResult1.now();
loadResult2.now();
Is there any benefit of arranging the code like this?
Same goes for search queries
iterable1 = ofy().load().type(Foo.class).iterable();
iterable2 = ofy().load().type(Bar.class).iterable();
iterable1.hasNext();
iterable2.hasNext();
Will the iterable2 load in parallel with iterable1?
Side question: is .iterable() in this regard any different from .list()?
I tried to debug the code, but it doesn't look like the call is made until call to .now(), or first call to .next()/.hasNext() in. Is it really so?

Yes - until you materialize a result, the queries proceed asynchronously in parallel.

Using Array[Boolean] in Scala to find out progress of foreach

I have a class in Scala that has a method to perform a bunch of calculations sequentially using foreach on a list which is provided in the constructor. The class has a field val progress: Array[Boolean] = list.map(_ => false).toArray. Some of these calculations can take a long time so at the end of each one I set the appropriate index in progress to true. Then I can get progress to determine where I am in the calculations from outside the class.
This does not seem like the best approach in Scala (because I'm using a mutable data structure) so any advice to improve it would be much appreciated.

I don't think your approach is bad. The alternative is to use a var progress: List[Boolean] as an immutable data structure and have a long list of immutable lists pointed at by that variable. You don't really gain anything, you lose the ability to reserve the exact memory you will need in a single step and memory allocation is going to make this slower.
There is a reason why mutable data structures exist and that is because they are incredibly useful and very needed, same as why you can still define var instead of val, the important piece is not that one is "bad" and the other "good", it is a matter of knowing when you can use val and sacrifice mutability in exchange for security. In your example you just can't.
Side note: Instead of using
val progress: Array[Boolean] = list.map(_ => false).toArray
This is much clearer and faster IMHO:
val progress = Array.fill(list.size)(false)

Well, it depends on what you want to do with that information. If you are interested in specific events (e.g., 50% done or something like that), you could pass a listener into your foreach method and ask to be notified. But if you really need to inquire about the current state at any time, then ... well, if you need to know the state, then you have to keep the state, there is no way around that :)
Array of booleans seems to be an overkill (you could just keep the current index instead), but you mentioned that you were planning to keep se additional info around as well, so, it looks reasonable.

Storing the value with the Ref, as long as it's not in the datastore

I'm have a List<Ref<Entity>>. I add new entries to the list like this:
entities.add(Ref.create(new_entry));
modified.add(new_entry);
When I store the entity that contains the list, I store the list itself and all the entities that are in the modified list. This works fine.
The problem is, that I have to work with the entities-list, while I add new entries to it. This requires iterating the list multiple times. The problem here is, that the refs in the list point to old entries (which are already in the datastore) and new entries (which are not yet in the datastore).
This causes the Ref.get()-method to return null for all the yet unstored entries in the list (the ones that are still in the modified-list).
I worked around this by doing this when inserting:
Ref<T> ref = new DeadRef<>(
Key.create(data),
data
);
this.entities.add(ref);
this.modified.add(data);
This way, I can mix stored and unstored entries in one list and Ref.get() always returns a value.
This works, but I have noticed that the refs in the entities-list stay DeadRefs when I store them to the datastore and load them in again.
Will this be a problem? Is there maybe even a better way to accomplish this?

This seems like a bad idea, although I don't know what specific problems you will run into.
The "right answer" is to save your entities first.
Edit: Also look at the documentation for ofy().defer().save(), which can prevent you from issuing a lot of unnecessary save operations.

Django: lock particular rows in table

I have the following django method:
def setCurrentSong(request, player):
try:
newCurrentSong = ActivePlaylistEntry.objects.get(
song__player_lib_song_id=request.POST['lib_id'],
song__player=player,
state=u'QE')
except ObjectDoesNotExist:
toReturn = HttpResponseNotFound()
toReturn[MISSING_RESOURCE_HEADER] = 'song'
return toReturn
try:
currentSong = ActivePlaylistEntry.objects.get(song__player=player, state=u'PL')
currentSong.state=u'FN'
currentSong.save()
except ObjectDoesNotExist:
pass
except MultipleObjectsReturned:
#This is bad. It means that
#this function isn't getting executed atomically like we hoped it would be
#I think we may actually need a mutex to protect this critial section :(
ActivePlaylistEntry.objects.filter(song__player=player, state=u'PL').update(state=u'FN')
newCurrentSong.state = u'PL'
newCurrentSong.save()
PlaylistEntryTimePlayed(playlist_entry=newCurrentSong).save()
return HttpResponse("Song changed")
Essentially, I want it to be so that for a given player, there is only one ActivePlaylistEntry that has a 'PL' (playing) state at any given time. However, I have actually experienced cases where, as a result of quickly calling this method twice in a row, I get two songs for the same player with a state of 'PL'. This is bad as I have other application logic that relies on the fact that a player only has one playing song at any given time (plus semantically it doesn't make sense to be playing two different songs at the same time on the same player). Is there a way for me to do this update atomically? Just running the method as a transaction with the on_commit_success decorator doesn't seem to work. Is there like a way to lock the table for all songs belonging to a particular player? I was thinking of adding a lock column to my model (boolean field) and either just spinning on it or pausing the thread for a few milliseconds and checking again but these feel super hackish and dirty. I was also thinking about creating a stored procedure but that's not really database independent.

Locking queries were added in 1.4.
with transaction.commit_manually():
ActivePlayListEntry.objects.select_for_update().filter(...)
aple = ActivePlayListEntry.objects.get(...)
aple.state = ...
transaction.commit()
But you should consider refactoring so that a separate table with a ForeignKey is used to indicate the "active" song.

CakePHP philosophical question

So i have a lot of different model types. Comments, posts, reviews, etc. And I want to join them into one integrated feed. Is there a CakePHP style of merging all this data for display, ordered by timestamp?
There are a lot of messy ways to do this but I wonder if there is some standard way which I am missing. Thanks!

Since the items are from different tables, it's difficult to retrieve them sorted together from the database in any case. Depending on how well organized your data is though, something as non-messy as this should do:
$posts = $this->Post->find(...);
$reviews = $this->Review->find(...);
$comments = $this->Comment->find(...);
$feed = array_merge($posts, $reviews, $comments);
usort($feed, function ($a, $b) {
$a = current($a);
$b = current($b);
return $strtotime($a['created']) - strtotime($b['created']);
});

philosophical? lol
No, I don't think there is one. Although you could write an afterSave() in app_model. Check for the data you're looking for, and if found, put it in Cache. It will probably be messy, but at least it's in one place, and doesn't affect the performance much.

There's definitely no way to merge the data automatically, but instead of firing separate queries for each relationship, you can grab it all at once using CakePHP's Containable behavior.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

GAE, memcache before DB update - google-app-engine

If you have the data, just update one entry in the memcache, no need to retrieve all from memcache. Something like data.put() memcache.set(key, data)

config = db.create_config(deadline=10, read_policy=db.STRONG_CONSISTENCY) data = Picture.all().order('-created').fetch(300, config=config) memcache.set('pictures_all', data) I guess, this is solution. EDIT: No, this is dont work

Great. I had the same problem and the solution was exactly what asker gave: the use of ancestors To read: data = Picture.all().order('-created').ancestor(main_key()).fetch(300) To save: pic = Picture(parent=main_key(), ...) pic.put()

Related

Objectify async: at what point RPC call is made?

Using Array[Boolean] in Scala to find out progress of foreach

Storing the value with the Ref, as long as it's not in the datastore

Django: lock particular rows in table

CakePHP philosophical question

Categories

Resources