Getting ``DeadlineExceededError'' using GAE when doing many (~10K) DB updates - google-app-engine

I am using Django 1.4 on GAE + Google Cloud SQL - my code works perfectly fine (on dev with local sqlite3 db for Django) but chocks with Server Error (500) when I try to "refresh" DB. This involves parsing certain files and creating ~10K records and saving them (I'm saving them in batch using commit_on_success).
Any advise ?

This error is raised for front end requests after 60 seconds. (its increased)
Solution options:
Use task queue (again a time limit of 10 minutes is imposed, which is practically enough).
Divide your task in smaller batches.
How we do it: we divide it on client side in smaller chunks and call them repeatedly.
Both the solutions work fine, depends on how you make these calls and want the results. Taskqueue doesn't return back the results to the client.

For tasks that take longer than 30s you should use task queue.
Also, database operations can also timeout when batch operations are too big. Try to use smaller batches.

Google app engine has a maximum time allowed for a request. If a request takes longer than 30 seconds, this error is raised. If you have a large quantity of data to upload, either import it directly from the admin console, or break up the request into smaller chunks, or use the command line python manage.py dbshell to upload the data from your computer.

Related

NDB query().iter() of 1000<n<1500 entities is wigging out

I have a script that, using Remote API, iterates through all entities for a few models. Let's say two models, called FooModel with about 200 entities, and BarModel with about 1200 entities. Each has 15 StringPropertys.
for model in [FooModel, BarModel]:
print 'Downloading {}'.format(model.__name__)
new_items_iter = model.query().iter()
new_items = [i.to_dict() for i in new_items_iter]
print new_items
When I run this in my console, it hangs for a while after printing 'Downloading BarModel'. It hangs until I hit ctrl+C, at which point it prints the downloaded list of items.
When this is run in a Jenkins job, there's no one to press ctrl+C, so it just runs continuously (last night it ran for 6 hours before something, presumably Jenkins, killed it). Datastore activity logs reveal that the datastore was taking 5.5 API calls per second for the entire 6 hours, racking up a few dollars in GAE usage charges in the meantime.
Why is this happening? What's with the weird behavior of ctrl+C? Why is the iterator not finishing?
This is a known issue currently being tracked on the Google App Engine public issue tracker under Issue 12908. The issue was forwarded to the engineering team and progress on this issue will be discussed on said thread. Should this be affecting you, please star the issue to receive updates.
In short, the issue appears to be with the remote_api script. When querying entities of a given kind, it will hang when fetching 1001 + batch_size entities when the batch_size is specified. This does not happen in production outside of the remote_api.
Possible workarounds
Using the remote_api
One could limit the number of entities fetched per script execution using the limit argument for queries. This may be somewhat tedious but the script could simply be executed repeatedly from another script to essentially have the same effect.
Using admin URLs
For repeated operations, it may be worthwhile to build a web UI accessible only to admins. This can be done with the help of the users module as shown here. This is not really practical for a one-time task but far more robust for regular maintenance tasks. As this does not use the remote_api at all, one would not encounter this bug.

How do I execute code on App Engine without using servlets?

My goal is to receive updates for some service (using http request-response) all the time, and when I get a specific information, I want to push it to the users. This code must run all the time (let's say every 5 seconds).
How can I run some code doing this when the server is up (means, not by an http request that initiates the execution of this code) ?
I'm using Java.
Thanks
You need to use
Scheduled Tasks With Cron for Java
You can set your own schedule (e.g. every minute), and it will call a specified handler for you.
You may also want to look at
App Engine Modules in Java
before you implement your code. You may separate your user-facing and backend code into different modules with different scaling options.
UPDATE:
A great comment from #tx802:
I think 1 minute is the highest frequency you can achieve on App Engine, but you can use cron to put 12 tasks on a Push Queue, either with delays of 5s, 10s, ... 55s using TaskOptions.countdownMillis() or with a processing rate of 1/5 sec.

Tasks targeted at dynamic backend fail frequently, silently

I had converted some tasks to run on a dynamic backend.
The tasks are failing silently [no logged error, no retry, nothing] ~20% of the time (min:10%, max:60%, sample:large, long term). Switching the task away from the backend restores retries and gets the failure rate back to ~0%.
Any ideas?
Converting it to a backend exacerbated the problem but wasn't the problem.
I had specified a task_retry_limit and the queue was a push queue. With a backend the number of instances is specified. (I believe you can replicate this issue on the frontend by ramping up requests rapidly, to a big number).
Tasks were failing 503: Instance Unavailable until they hit the task_retry_limit. This is visible temporarily in Task Queues, but will not show up in Logs.
I should be using pull queues. Even if my use case was stupid I'd probably +1 a task dying due to multiple 503: Instance Unavailable logging something so it doesn't appear like a phantom task.
Which runtime are you using on the backend?
Try running the backend for a bit without dynamic set to true and exercise the failing component.
On my project, I have seen tasks that target a static backend disappear on occasion, but no where near the rate you are seeing.

Refactoring a Google App Engine datastore

In my datastore I had a few hundred entities of kind PlayerStatistic that I wanted to rename to GamePlayRecord. On the dev server it was easy to do this by writing a small script in the Interactive Console. However there is no Interactive Console once the app has been deployed.
Instead, I copied that script into a file and linked the file in app.yaml. I deployed the script, intending to run it once and then delete it. However, I ran into another problem, which is that the script ran for over 30 seconds. The script would always get cut off before it could complete.
My solution ended up being rewriting the script so that it creates and deletes the entities one at a time. That way, even when it timed out, the script could continue where it left off. Since I only have a few hundred entities this took about 5 refreshes.
Is there a better way to run one-time refactoring scripts on Google App Engine? Is there a good way to get around the 30 second limit in order to run these refactoring scripts?
Use the task queue.
Tasks can run for more much longer than web requests. You can also split up the work into many tasks, so they will run parallel and finish faster. When you finish the task, you can programmatically insert a new task, so the whole process is automated and you don't need to manually refresh.
appengine-mapreduce is a good way to do datastore refactoring. It takes care of a lot of the messy details that you would have to grapple with when writing task code by hand.

Google App Engine Large File Upload

I am trying to upload data to Google App Engine (using GWT). I am using the FileUploader widget and the servlet uses an InputStream to read the data and insert directly to the datastore. Running it locally, I can upload large files successfully, but when I deploy it to GAE, I am limited by the 30 second request time. Is there any way around this? Or is there any way that I can split the file into smaller chunks and send the smaller chunks?
By using the BlobStore you have a 1 GB size limit and a special handler, called unsurprisingly BlobstoreUpload Handler that shouldn't give you timeout problems on upload.
Also check out http://demofileuploadgae.appspot.com/ (sourcecode, source answer) which does exactly what you are asking.
Also, check out the rest of GWT-Examples.
Currently, GAE imposes a limit of 10 MB on file upload (and response size) as well as 1 MB limits on many other things; so even if you had a network connection fast enough to pump up more than 10 MB within a 30 secs window, that would be to no avail. Google has said (I heard Guido van Rossum mention that yesterday here at Pycon Italia Tre) that it has plans to overcome these limitations in the future (at least for users of GAE which pay per-use to exceed quotas -- not sure whether the plans extend to users of GAE who are not paying, and generally need to accept smaller quotas to get their free use of GAE).
you would need to do the upload to another server - i believe that the 30 second timeout cannot be worked around. If there is a way, please correct me! I'd love to know how!
If your request is running out of request time, there is little you can do. Maybe your files are too big and you will need to chunk them on the client (with something like Flash or Java or an upload framework like pupload).
Once you get the file to the application there is another issue - the datastore limitations. Here you have two options:
you can use the BlobStore service which has quite nice API for handling up 50megabytes large uploads
you can use something like bigblobae which can store virtually unlimited size blobs in the regular appengine datastore.
The 30 second response time limit only applies to code execution. So the uploading of the actual file as part of the request body is excluded from that. The timer will only start once the request is fully sent to the server by the client, and your code starts handling the submitted request. Hence it doesn't matter how slow your client's connection is.
Uploading file on Google App Engine using Datastore and 30 sec response time limitation
The closest you could get would be to split it into chunks as you store it in GAE and then when you download it, piece it together by issuing separate AJAX requests.
I would agree with chunking data to smaller Blobs and have two tables, one contains th metadata (filename, size, num of downloads, ...etc) and other contains chunks, these chunks are associated with the metadata table by a foreign key, I think it is doable...
Or when you upload all the chunks you can simply put them together in one blob having one table.
But the problem is, you will need a thick client to serve chunking-data, like a Java Applet, which needs to be signed and trusted by your clients so it can access the local file-system

Resources