Objectify async: at what point RPC call is made? - google-app-engine

Quite often I want to make two or more independent queries to fetch entities from Datastore. But I'm not sure if they are really parallel. For example:
loadResult1 = ofy().load().key(Key.create(Foo.class, 1));
loadResult2 = ofy().load().key(Key.create(Bar.class, 1));
loadResult1.now();
loadResult2.now();
Is there any benefit of arranging the code like this?
Same goes for search queries
iterable1 = ofy().load().type(Foo.class).iterable();
iterable2 = ofy().load().type(Bar.class).iterable();
iterable1.hasNext();
iterable2.hasNext();
Will the iterable2 load in parallel with iterable1?
Side question: is .iterable() in this regard any different from .list()?
I tried to debug the code, but it doesn't look like the call is made until call to .now(), or first call to .next()/.hasNext() in. Is it really so?

Yes - until you materialize a result, the queries proceed asynchronously in parallel.

Related

What is the purpose of next('r') in the context of an RxJS Subject

I'm still fairly new to the RxJS world (please pardon my semantics), but I've seen a few examples of code that creates a Subject to do some work, and then calls next(0), or next('r') on the subscription. It appears to re-run the stream, or rather fetch the next value from the stream.
However, when I tried using this to call an API for some data, it completely skips over the work it's supposed to do as defined in the stream (assuming it would "run" the stream again and get new data from the server), and instead my subscriber gets the 'r' or zero value back when I try to call next like that.
I get that making the subscription "starts execution of the stream", so to speak, but if I want to "re-run" it, I have to unsubscribe, and resubscribe each time.
Is it a convention of some kind to call next with a seemingly redundant value? Am I just using it in the wrong way, or is there a good use-case for calling next like that? I'm sure there's something fundamental that I'm missing, or my understanding of how this works is very wrong.
It's a good question, I definitely recommend you to read about hot and cold Observables.
cold Observables execute each time someone subscribes to it.
const a$ = of(5).pipe(tap(console.log))
a$.subscribe(); // the 'tap' will be executed here
a$.subscribe(); // and here, again.
hot Observables do not care about subscriptions in terms of execution:
const a$ = of(5).pipe(
tap(console.log),
shareReplay(1)
);
a$.subscribe(); // the 'tap' will be executed here
a$.subscribe(); // but not here! console.logs only once
In your example you are using Subject that represents cold Observable.
You can try to use BehaviorSubject or ReplaySubject - both of them are hot but be aware that they behave differently.
IN you example you can modify your Subject like the following:
const mySubject = new Subject();
const myStream$ = mySubject.pipe(
shareReplay(1)
);
myStream$.subscribe(x => console.log(x))
mySubject.next(1);
mySubject.next(2);
mySubject.next(3);

Using Array[Boolean] in Scala to find out progress of foreach

I have a class in Scala that has a method to perform a bunch of calculations sequentially using foreach on a list which is provided in the constructor. The class has a field val progress: Array[Boolean] = list.map(_ => false).toArray. Some of these calculations can take a long time so at the end of each one I set the appropriate index in progress to true. Then I can get progress to determine where I am in the calculations from outside the class.
This does not seem like the best approach in Scala (because I'm using a mutable data structure) so any advice to improve it would be much appreciated.
I don't think your approach is bad. The alternative is to use a var progress: List[Boolean] as an immutable data structure and have a long list of immutable lists pointed at by that variable. You don't really gain anything, you lose the ability to reserve the exact memory you will need in a single step and memory allocation is going to make this slower.
There is a reason why mutable data structures exist and that is because they are incredibly useful and very needed, same as why you can still define var instead of val, the important piece is not that one is "bad" and the other "good", it is a matter of knowing when you can use val and sacrifice mutability in exchange for security. In your example you just can't.
Side note: Instead of using
val progress: Array[Boolean] = list.map(_ => false).toArray
This is much clearer and faster IMHO:
val progress = Array.fill(list.size)(false)
Well, it depends on what you want to do with that information. If you are interested in specific events (e.g., 50% done or something like that), you could pass a listener into your foreach method and ask to be notified. But if you really need to inquire about the current state at any time, then ... well, if you need to know the state, then you have to keep the state, there is no way around that :)
Array of booleans seems to be an overkill (you could just keep the current index instead), but you mentioned that you were planning to keep se additional info around as well, so, it looks reasonable.

GAE, memcache before DB update

I have some troubles with memcache and GAE DB operations.
if i update memcache rigth after DB operations, x.put(), for example, my memcache function often return old value. If i use sleep(), cache more often correct, but this is not right, in my opinion
sleep(0.2)
data = Picture.all().order('-created').fetch(300)
memcache.set('pictures_all', data)
What i need to do, to get correct memcache?
ANSWER:
Need to use parent with query, all Picture entities must have same parent, then you get strong consistant results
data = Picture.all().order('-created').ancestor(main_key()).fetch(300)
memcache.set('pictures_all', data)
If you have the data, just update one entry in the memcache, no need to retrieve all from memcache. Something like
data.put()
memcache.set(key, data)
You're on the right track that the problem is with eventual consistency.
Using STRONG_CONSISTENCY does solve the problem, but it'll give you scalability problems down the road - ones that will be difficult to resolve.
The solution for this is, annoyingly, more complex than it should be. I'm also not sure whether there's really a bulletproof solution given the eventual consistency behavior.
pseudocode should look something like this:
all_pictures = memcache.get('pictures_all')
if not all_pictures:
all_pictures = convert_to_list(Picture.all().order('-created').fetch(300))
if not newdata in all_pictures:
add_to_list_in_proper_order(all_pictures, newdata)
memcache.set('pictures_all', all_pictures)
config = db.create_config(deadline=10, read_policy=db.STRONG_CONSISTENCY)
data = Picture.all().order('-created').fetch(300, config=config)
memcache.set('pictures_all', data)
I guess, this is solution.
EDIT: No, this is dont work
Great.
I had the same problem and the solution was exactly what asker gave: the use of ancestors
To read:
data = Picture.all().order('-created').ancestor(main_key()).fetch(300)
To save:
pic = Picture(parent=main_key(), ...)
pic.put()

Django: lock particular rows in table

I have the following django method:
def setCurrentSong(request, player):
try:
newCurrentSong = ActivePlaylistEntry.objects.get(
song__player_lib_song_id=request.POST['lib_id'],
song__player=player,
state=u'QE')
except ObjectDoesNotExist:
toReturn = HttpResponseNotFound()
toReturn[MISSING_RESOURCE_HEADER] = 'song'
return toReturn
try:
currentSong = ActivePlaylistEntry.objects.get(song__player=player, state=u'PL')
currentSong.state=u'FN'
currentSong.save()
except ObjectDoesNotExist:
pass
except MultipleObjectsReturned:
#This is bad. It means that
#this function isn't getting executed atomically like we hoped it would be
#I think we may actually need a mutex to protect this critial section :(
ActivePlaylistEntry.objects.filter(song__player=player, state=u'PL').update(state=u'FN')
newCurrentSong.state = u'PL'
newCurrentSong.save()
PlaylistEntryTimePlayed(playlist_entry=newCurrentSong).save()
return HttpResponse("Song changed")
Essentially, I want it to be so that for a given player, there is only one ActivePlaylistEntry that has a 'PL' (playing) state at any given time. However, I have actually experienced cases where, as a result of quickly calling this method twice in a row, I get two songs for the same player with a state of 'PL'. This is bad as I have other application logic that relies on the fact that a player only has one playing song at any given time (plus semantically it doesn't make sense to be playing two different songs at the same time on the same player). Is there a way for me to do this update atomically? Just running the method as a transaction with the on_commit_success decorator doesn't seem to work. Is there like a way to lock the table for all songs belonging to a particular player? I was thinking of adding a lock column to my model (boolean field) and either just spinning on it or pausing the thread for a few milliseconds and checking again but these feel super hackish and dirty. I was also thinking about creating a stored procedure but that's not really database independent.
Locking queries were added in 1.4.
with transaction.commit_manually():
ActivePlayListEntry.objects.select_for_update().filter(...)
aple = ActivePlayListEntry.objects.get(...)
aple.state = ...
transaction.commit()
But you should consider refactoring so that a separate table with a ForeignKey is used to indicate the "active" song.

threadpools - boss/worker vs peer (workcrew) models

I'm aiming to use a threadpool with pthreads and am trying to choose between these two models of threading and it seems to me that the peer model is more suitable when working with fixed input, whereas the boss/worker model is better for dynamically changing work items. However, I'm a little unsure of how exactly to get the peer model to work with a threadpool.
I have a number of tasks that all need to be performed on the same data set. Here's some simple psuedocode for how I would look at tackling this:
data = [0 ... 999]
data_index = 0
data_size = 1000
tasks = [0 ... 99]
task_index = 0
threads = [0 ... 31]
thread_function()
{
while (true)
{
index = data_index++ (using atomics)
if index > data_size
{
sync
if thread_index == 0
{
data_index = 0
task_index++
sync
}
else
{
sync
}
continue
}
tasks[task_index](data[index])
}
}
(Firstly, it seems like there should be a way of making this use just one synchronisation point, but I'm not sure whether that's possible?)
The above code seems like it will work well for the case where the the tasks are known in advance, though I guess a threadpool is unnecessary for this particular problem. However even if the data items are still predefined across all tasks, if the tasks are not known in advance, it seems like the boss/worker model is better suited? Is it possible to use the boss/worker model but still allow the tasks to be picked up by the threads themselves (as above), where the boss essentially suspends itself until all tasks are complete? (Maybe this is still termed the peer model?)
Final question is regarding the synchronisation, barrier or condition variable and why?
If anyone can make any suggestions as to how better to approach this problem or even to poke holes in any of my assumptions, that would be great? Unfortunately I'm restricted from using a more higher-level library such as tbb for tackling this.
Edit: I should point out in case this isn't clear, each task needs to be completed in it's entirety before moving onto the next.
I'm a bit confused by your description here, hope the below is relevant.
I always looked at this pattern and found it very useful: The "boss" is responsible for detecting work and dispatching it to a worker pool based on some algorithm, from that time on, the worker is independent.
In this scenario, the worker is always waiting for work, not aware of any other instance, process requests and when it finishes, may trigger a notification of completion.
This has the advantage of good separation between the work itself and the algorithm that balance between the threads.
The other option is for the "boss" to maintain a pool of work items, and the workers to always pick them up as soon as they are free. But I guess this is more complex to implement and requires a larger amount of synchronization. I do not see the benefit of this second approach over the previous one.
Control logic and worker state is maintained by the "boss" in both scenarios.
As the paralleled work is done on a task, the "boss" "object" is handling a task, in a simple implementation, this "boss" blocks until a task is finished, allowing to call the next "boss" in line.
Regarding the Sync, unless I'm missing here something, you only need to sync once for all the workers to finish and this sync is done at the "boss" where the workers just send notifications that they finished.

Resources