I have a function in my app that does some processing in a transaction - creates or fails to create an entity depending on the attributes of others in the entity group.
I have been doing some testing that sees this function called in fast succession, a few times a second is possible in these tests.
The function triggers some deferred tasks that read from the entity group, but do not write to it.
I noticed something funny - when these tasks are triggered immediately, and interleave with the main function calls, I get contention errors quite frequently.
If I put a countdown of a couple of seconds on the deferred tasks, the main functions process successfully.
That suggests to me that the deferred tasks are causing contention on the entity group the main function writes to - but I thought reads from an entity group couldn't do this? Do look ups by keyname cause contention? Queries with filters?
It's kind of puzzling me. Should this be happening? I've read elsewhere that there is a limit of 1 write per second per entity group, but my tests routinely break that limit...at least when my spin-off deferred tasks are delayed for a couple of seconds.
This is on production, by the way.
Thanks for any insight!
Related
What I'm doing is creating a transaction where
1) An entity A has a counter updated to +1
2) A new entity B is written to the datastore.
It looks like this:
WrappedBoolean result = ofy().transact(new Work<WrappedBoolean>() {
#Override
public WrappedBoolean run() {
// Increment count of EntityA.numEntityBs by 1 and save it
entityA.numEntityBs = entityA.numEntityBs +1;
ofy().save().entity(entityA).now();
// Create a new EntityB and save it
EntityB entityB = new EntityB();
ofy().save().entity(entityB).now();
// Return that everything is ok
return new WrappedBoolean("True");
}
});
What I am doing is keeping a count of how many EntityB's entityA has. The two operations need to be in a transaction so either both saves happen or neither happen.
However, it is possible that many users will be executing the api method that contains the above transaction. I fear that I may run into problems of too many people trying to update entityA. This is because if multiple transactions try to update the same entity, the first one to commit wins but all the others fail.
This leads me to two questions:
1) Is the transaction I wrote a bad idea and destined to cause writes not being made if a lot of calls are made to the API method? Is there a better way to achieve what I am trying to do?
2) What if there are a lot of updates being made to an entity not in a transaction (such as updating a counter the entity has) - will you eventually run into a scaling problem if a lot of updates are being made in a short period of time? How does datastore handle this?
Sorry for the long winded question but I hope someone could shed some light on how this system works for me with the above questions. Thanks.
Edit: When I mean a lot of updates being made to an entity over a short period of time, consider something like Instagram, where you want to keep track of how many "likes" a picture has. Some users have millions of followers and when they post a new picture, they can get something like 10-50 likes a second.
The datastore allows about 1 write/second per entity group. What might not appear obvious is that standalone entities (i.e. entities with no parent and no children) still belong to one entity group - their own. Repeated writes to the same standalone entity is thus subject to the same rate limit.
Exceeding the write limit will eventually cause write ops to fail with something like TransactionFailedError(Concurrency exception.)
Repeated writes to the same entity done outside transactions can overwrite each-other. Transactions can help with this - conflicting writes would be automatically retried a few times. Your approach looks OK from this prospective. But it only works if the average write rate remains below the limit.
You probably want to read Avoiding datastore contention. You need to shard your counter to be able to count events at more than 1/second rates.
I have a bit of a strange problem. I have a module running on gae that puts a whole lot of little tasks on the default task queue. The tasks access the same ndb module. Each task accesses a bunch of data from a few different tables then calls put.
The first few tasks work fine but as time continues I start getting these on the final put:
suspended generator _put_tasklet(context.py:358) raised TransactionFailedError(too much contention on these datastore entities. please try again.)
So I wrapped the put with a try and put in a randomised timeout so it retries a couple of times. This mitigated the problem a little, it just happens later on.
Here is some pseudocode for my task:
def my_task(request):
stuff = get_ndb_instances() #this accessed a few things from different tables
better_stuff = process(ndb_instances) #pretty much just a summation
try_put(better_stuff)
return {'status':'Groovy'}
def try_put(oInstance,iCountdown=10):
if iCountdown<1:
return oInstance.put()
try:
return oInstance.put()
except:
import time
import random
logger.info("sleeping")
time.sleep(random.random()*20)
return oInstance.try_put(iCountdown-1)
Without using try_put the queue gets about 30% of the way through until it stops working. With the try_put it gets further, like 60%.
Could it be that a task is holding onto ndb connections after it has completed somehow? I'm not making explicit use of transactions.
EDIT:
there seems to be some confusion about what I'm asking. The question is: Why does ndb contention get worse as time goes on. I have a whole lot of tasks running simultaneously and they access the ndb in a way that can cause contention. If contention is detected then a randomy timed retry happens and this eliminates contention perfectly well. For a little while. Tasks keep running and completing and the more that successfully return the more contention happens. Even though the processes using the contended upon data should be finished. Is there something going on that's holding onto datastore handles that shouldn't be? What's going on?
EDIT2:
Here is a little bit about the key structures in play:
My ndb models sit in a hierarchy where we have something like this (the direction of the arrows specifies parent child relationships, ie: Type has a bunch of child Instances etc)
Type->Instance->Position
The ids of the Positions are limited to a few different names, there are many thousands of instances and not many types.
I calculate a bunch of Positions and then do a try_put_multi (similar to try_put in an obvious way) and get contention. I'm going to run the code again pretty soon and get a full traceback to include here.
Contention will get worse overtime if you continually exceed the 1 write/transaction per entity group per second. The answer is in how Megastore/Paxo work and how Cloud Datastore handles contention in the backend.
When 2 writes are attempted at the same time on different nodes in Megastore, one transaction will win and the other will fail. Cloud Datastore detects this contention and will retry the failed transaction several times. Usually this results in the transaction succeeding without any errors being raised to the client.
If sustained writes above the recommended limit are being attempted, the chance that a transaction needs to be retried multiple times increases. The number of transactions in an internal retry state also increases. Eventually, transactions will start reaching our internal retry limit and will return a contention error to the client.
Randomized sleep method is an incorrect way to handle error response situations. You should instead look into exponential back-off with jitter (example).
Similarly, the core of your problem is a high write rate into a single entity group. you should look into whether the explicit parenting is required (removing it if not), or if you should shard the entity group in some manner that makes sense according to your queries and consistency requirements.
I have a counter in my app where I expect that 99% of the time there will not be contention issues in updating the counter with transactions.
To handle the 1% times when it is busy, I was thinking of updating the counter by using transactions within deferred tasks as follows:
def update_counter(my_key):
deferred.defer(update_counter_transaction)
#ndb.transactional
def update_counter_transaction(my_key):
x = my_key.get()
x.n += 1
x.put()
For the occasional instances when contention causes the transaction to fail, the task will be retried.
I'm familiar with sharded counters but this seems easier and suited to my situation.
Is there anything I am missing that might cause this solution to not work well?
A problem may exist with the automatic task retries which at least theoretically may happen for reasons other than transaction colissions for the intended counter increments. If such undesired retry successfully re-executes the counter increment code the counter value may be thrown off (will be higher than the expected value). Which might or might not be acceptable for your app, depending on the use of the counter.
Here's an example of undesired defered task invocation: GAE deferred task retried due to "instance unavailable" despite having already succeeded
The answer to that question seems inline with this note on regular task queue documentation (I saw no such note in the deferred task queues article, but I marked it as possible in my brain):
Note that task names do not provide an absolute guarantee of once-only
semantics. In extremely rare cases, multiple calls to create a task of
the same name may succeed, but in this event, only one of the tasks
would be executed. It's also possible in exceptional cases for a task
to run more than once.
From this perspective it might actually be better to keep the counter incrementing together with the rest of the related logical/transactional operations (if any) than to isolate it as a separate transaction on a task queue.
I am trying to perform some data processing in a GAE application over data that is stored in the Datastore. The bottleneck point is the throughput in which the query returns entities and I wonder how to improve the query's performance.
What I do in general:
everything works in a task queue, so we have plenty of time (10 minute deadline).
I run a query over the ndb entities in order to select which entities need to be processed.
as the query returns results, I group entities in batches of, say, 1000 and send them to another task queue for further processing.
the stored data is going to be large (say 500K-1M entities) and there is a chance that the 10 minutes deadline is not enough. Therefore, when the task is reaching the taskqueue deadline, I spawn a new task. This means I need an ndb.Cursor in order to continue the query from where it stopped.
The problem is the rate in which the query returns entities. I have tried several approaches and observed the following performance (which is too slow for my app):
Use fetch_page() in a while loop.
The code is straightforward
while has_more and theres_more_time:
entities, cursor, more = query.fetch_page(1000, ...)
send_to_process_queue(entities)
has_more = more and cursor
With this approach, it takes 25-30 seconds to process 10K entities. Roughly speaking, that is 20K entities per minute. I tried changing the page size or the class of the frontend instance; neither made any difference in performance.
Segment the data and fire multiple fetch_page_async() in parallel.
This approach is taken from here (approach C)
The overall performance remains the same as above. I tried with various number of segments (from 2 to 10) in order to have 2-10 parallel fetch_async() calls. In all cases, the overall time remained the same. The more parallel fetch_page_async() are called, the longer it takes for each one to complete. I also tried with 20 parallel fetches and it got worse. Changing the page size or the fronted instance class did not have and impact either.
Fetch everything with a single fetch() call.
Now this is the least suitable approach (if not unsuitable at all) as the instance may run out of memory, plus I don't get a cursor in case I need to spawn to another task (in fact I won't even have the ability to do so, the task will simply exceed the deadline). I tried this out of curiosity in order to see how it performs and I observed the best performance! It took 8-10 seconds for 10K entities, which is roughly be 60K entities per minute. Now that is approx. 3 times faster than fetch_page(). I wonder why this happens.
Use query.iter() in a single loop.
This is match like the first approach. This will make use of the query iterator's underlying generator, plus I can obtain a cursor from the iterator in case I need to spawn a new task, so it suits me. With the query iterator, it fetched 10K entities in 16-18 seconds, which is approx. 36-40K entities per minute. The iterator is 30% faster than fetch_page, but much slower that fetch().
For all the above approaches, I tried F1 and F4 frontend instances without any difference in Datastore performance. I also tried to change the batch_size parameter in the queries, still without any change.
A first question is why do fetch(), fetch_page() and iter() behave so differently and how to make either fetch_page() or iter() do equally well as fetch()? And then another critical question is whether these throughputs (20-60K entities per minute, depending on api call) are the best we can do in GAE.
I 'm aware of the MapReduce API but I think it doesn't suit me. AFAIK, the MapReduce API doesn't support queries and I don't want to scan all the Datastore entities (it's will be too costly and slow - the query may return only a few results). Last, but not least, I have to stick to GAE. Resorting to another platform is not an option for me. So the question really is how to optimize the ndb query.
Any suggestions?
In case anyone is interested, I was able to significantly increase the throughput of the data processing by re-designing the component - it was suggested that I change the data models but that was not possible.
First, I segmented the data and then processed each data segment in a separate taskqueue.Task instead of calling multiple fetch_page_async from a single task (as I described in the first post). Initially, these tasks were processed by GAE sequentially utilizing only a single Fx instance. To achieve parallelization of the tasks, I moved the component to a specific GAE module and used basic scaling, i.e. addressable Bx instances. When I enqueue the tasks for each data segment, I explicitly instruct which basic instance will handle each task by specifying the 'target' option.
With this design, I was able to process 20.000 entities in total within 4-5 seconds (instead of 40'-60'!), using 5 B4 instances.
Now, this has additional costs because of the Bx instances. We 'll have to fine tune the type and number of basic instances we need.
The new experimental Data Processing feature (an AppEngine API for MapReduce) might be suitable. It uses automatic sharding to execute multiple parallel worker processes, which may or may not help (like the Approach C in the other linked question).
Your comment about "no need to scan all entities" triggers the thought that custom indexes could help your queries. That may entail schema changes to store the data in a less normal form.
Design a solution from the output perspective - what the simplest query is that produces the required results, then what the entity structure is to support such a query, then what work is needed to create and maintain such an entity structure from the current data.
I'm using Map Reduce (http://code.google.com/p/appengine-mapreduce/) to do an operation over a set of entities. However, I am finding my operations are being duplicated.
Are map reduce maps sometimes called more than once for a specific entity? Is this the case even if they don't fail the initial time?
edit: here are some more details.
def reparent_request(entity):
#check if the entity has a parent
if not is_valid_to_reparent(entity):
return
#copy it
try:
copy = clone_entity(Request, entity, parent=entity.user)
copy.put() #we hard put here so we can use the reference later in this function.
except:
...
... update some references to the copied object ...
#delete the original
yield op.db.Delete(entity)
At the end, I am non-deterministically left with two entities, both with the new parent.
I've reparented a load of entities before - it was a nightmare because of the exact problem you're facing.
What I would do instead is:
Create a new queue. Ensure its paused and that you have a lot of storage space dedicated to queues. It's only temporary, but you'll need it.
Instead of editing your entities in your map reduce job, add them to the queue with a name that will be unique for each entity. The key works fine.
When adding to the queue, because it's paused you'll get an error if you try and add the same named queue twice - so catch the error and skip it, because you know that entity must already have been touched by the map reduce job.
When you're confident that every entity has a matching queue task and the map reduce job has finished, unpause your queue. The queue will do the reparenting.
A couple of notes:
* the task queue size can get pretty big. Can't remember numbers, but it was gigs. Also the size of the queue doesn't update in real time - so don't worry that it might still says gigs of tasks when the queue is nearly empty.
* the reliability of the queue storage is an unknown I believe. It didn't happen to us, but queue items could disappear I guess. Fortunately, you can rerun this process multiple times to ensure it worked, especially if you're deleting entities.
* you may want to ensure you queue has a concurrency limit on it. Without one, a delay in the execution of a couple of tasks can absolutely cripple your application. Learnt that the hard way! I think 30 concurrent tasks went quite well for us.
Hope that's useful, let me know if you come up with any improvements!
App Engine mapreduce runs on the task queue, and like anything else that uses the task queue, tasks have to be idempotent - that is, running them multiple times should have the same effect as running them once. Tasks will occasionally be run more than once; the mapreduce library may have its own reasons for rerunning mapper tasks, too.
In your situation, I'd suggest creating the new entity with a key whose ID is the same as the old entity; that way running it multiple times will just overwrite the same entity.