How to get details of DownloadError? - google-app-engine

I do the following:
try:
result = urlfetch.fetch(url=some_url,
...
except DownloadError:
self.response.out.write('DownloadError')
logging.error('DownloadError')
except Error:
self.response.out.write('Error')
logging.error('Error')
Is there any way to get some more detailed description on what happened?

You should use logging.exception to add the Exception to the ERROR logging message:
try:
result = urlfetch.fetch(url=some_url,
...
except DownloadError, exception:
self.response.out.write('Oops, DownloadError: %s' % exception)
logging.exception('DownloadError')
except Error:
self.response.out.write('Oops, Error')
logging.exception('Error')

In short, no. A download error is usually a timeout in our experience - something on the back end taking too long to respond (first byte). If it is receiving data, it looks like GAE will wait and throw a Deadline exception instead after your 10 seconds is up.
Does it ever succeed? Your choices on how to deal with d/l exceptions will vary depending on the back-end.
If you're taking the simplistic route and just retrying, beware of quotas and limiters - chances are your requests are indeed reaching the other system, and just aren't coming back in time. Very easy to blow past limiters this way.
J

Related

App Engine generating infinite retries

I have a backends that is normally invoked by a cron to run a few times every day. Yesterday, I noticed it was restarting without stopping. I dont see a place in my code where that invocation is happening. Rather, the task queue seems to indicate it is running due to re-tries due to errors. One error is that status is saved to bigQuery and that is failing because a quoto is exceeded. But this seems to generate an infinite loop. Is this a bug in app engine or I am doing something wrong? Is there a way to indicate to not restart a task if it fails? My other app engine tasks that terminate without 200 status dont do that...
Here is a trace of the queue from which the restarts keep happening:
Here is the logging showing continous running
And here is the http header inside the logging
UPDATE1
Here is the cron:
<?xml version="1.0" encoding="UTF-8"?>
<cronentries>
<cron>
<url>/uploadToBigQueryStatus</url>
<description>Check fileNameSaved Status</description>
<schedule>every 15 minutes from 02:30 to 03:30</schedule>
<timezone>US/Pacific</timezone>
<target>checkuploadstatus-backend</target>
</cron>
</cronentries>
UPDATE 2
As for the comment about catching the error: The error I believe is that the biqQuery job fails because a quota has been hit. Strange thing is that it happened yesterday, and the quota should have been reset, so the error should have good away for at least a while. I dont understand why the task retries, I never selected that option that I am aware of.
I killed the servlet and emptied the task queue so at least it is stopped. But I dont know the root cause. IF BQ table quota was the reason, that shouldnt cause an infinite retry!
UPDATE 3
I have not trapped the servlet call that produced the error that led to the infinite retry. But I checked this cron activated servlet today and found I had another non-200 result. The return value this time was 500 and it is caused by a DataStore time-out exception.
Here is the screen shot of the return that show 500 return code.
Here is the exception info page 1
And the following data
The offending code line is the for loop iterating on the data store query
if (keys[0] != null) {
/* Define the query */
q = new Query(bucket).setAncestor(keys[0]);
pq = datastore.prepare(q);
gotResult = false;
// First system time stamp
Date date= new Timestamp(new Date().getTime());
Timestamp timeStampNow = new Timestamp(date.getTime());
for (Entity result : pq.asIterable()) {
I will add a try-catch on this for loop as it is crashing in this iteration.
if (keys[0] != null) {
/* Define the query */
q = new Query(bucket).setAncestor(keys[0]);
pq = datastore.prepare(q);
gotResult = false;
// First system time stamp
Date date= new Timestamp(new Date().getTime());
Timestamp timeStampNow = new Timestamp(date.getTime());
try {
for (Entity result : pq.asIterable()) {
Hopefully, the data store read will not crash the servlet but it will render a failure. At leas the cron will run again and pickup other non-handled results.
By the way, is this a java error or app engine? I see a lot of these data store time outs and I will add a try-catch around all the result loops. Still, it should not cause the infinite retry that I experienced. I will see if I can find the actual crash..problem is that it overloaded my logging...More later.
UPDATE 4
I went back to the logs to see when the inifinite loop began. In the logs below, I opened the run that is at the head of the continuous running. YOu can see that it fails with 500 every 5th time. It is not the cron that invoked it, it was me calling the servlet to check biq query upload status (I write to the data store the job info, then read it back in servlet and write to bigQuery the job status and if done, erase the data store entry.) I cannot explain the steady 500 errors every 5th call, but it is always the Data Store Timeout exception.
UPDATE 5
Can the infinite retries be happening because of the queue configuration?
CheckUploadStatus
20/s
10
100
10
200
2
I just noticed another task queue had a 500 return code and it was continuously retrying. I did some search and found some people have tried to configure
the queues for no retry. They said that didnt work.
See this link:
Google App Engine: task_retry_limit doesn't work?
But one re-try is possible? That is far better than infinite.
It is contradictory that Google enforces quotas but seems to prefer infinite retries. I would much prefer block the retries by default on non-200 return code and then have NO QUOTAS!!!
According to Retrying cron jobs that fail:
If a cron job's request handler returns a status code that is not in
the range 200–299 (inclusive) App Engine considers the job to have
failed. By default, failed jobs are not retried.
To set failed jobs to be retried:
Include a retry-parameters block in your cron.xml file.
Choose and set the retry parameters in the retry-parameters block.
Your cron config doesn't specify the necessary retry parameters, so the jobs returning the 500 code should, indeed, not be retried, as you expect.
So this looks like a bug. Possibly a variant of the (older) known issue 10075 - the 503 code mentioned there might have changed in the mean time - but it is also a quota-related failure.
The suggestion from GAEfan's comment is likely a good workaround:
You will need to catch the error, and send a 200 response to stop the
task queue from retrying. – GAEfan 1 hour ago

Need to perform bulk delete on Search documents -- routinely getting "took too long to respond" errors

I perform a cron job where I need to update my search indices. As part of updating, I delete old documents with this code:
while True:
results = index.search(search.Query(
query_string="locationID="+location_id,
options=search.QueryOptions(
limit=100,
cursor=cursor,
ids_only=True)))
cursor = results.cursor
doc_ids = [tmp_result.doc_id for tmp_result in results]
index.delete(doc_ids)
if not cursor: # if cursor is None, meaning no more results
break
I am relatively routinely seeing this error in my logs:
DeadlineExceededError: The API call search.DeleteDocument() took too long to respond
and was cancelled.
Is there something I'm doing wrong with my deletion code that I'm seeing this error pop up?
Edit:
Is this just a random error that will show up from time to time? If so, should I just implement a redo with an exponential backoff like so:
def delete_doc_ids(doc_ids, retries):
success = False
time_to_sleep = 2**retries*0.1 #100 ms
time.sleep(time_to_sleep)
retries+=1
try:
index.delete(doc_ids)
success = True
return success, retries
except:
logging.info("Failure to delete documents. Retrying in %s seconds"%time_to_sleep)
return success, retries
# because this step fails a lot, keep running in a while loop until it works with exponential backoff
deletion_finished = False
retries = 0
#keep trying until deletion_finished returns true on an expontential backoff
while not deletion_finished:
deletion_finished, retries = delete_doc_ids(doc_ids,retries)
Edit 2:
What is the default deadline alluded to here? I dug through the RPC source files and can't find it.
Try sending the job to the taskqueue, which has a much longer deadline (10 minutes), or to a custom module, which can run indefinitely.

AppEngine Error Handling

All,
Is there a way to error out/exit execution out of a handler? For instance, if the incoming request doesn't contain the correct headers we want to send a 400 and exit/close the connection. However, whenever we use self.error(400) or self.response.set_status(400) any other code after it executes anyway So, for example:
class MyPastaHandler(webapp2.Handler):
def get():
if not self.request.headers.get('My-Custom-Header'):
self.error(400)
...
[more code]
self.response.out.write('{"success": "true"}')
When I submit a request w/o the said custom header, I get back a 400, but I also get the success json in the body of the response, which tells me that self.error(400) doesn't stop execution and neither does self.response.set_status(400).
So, the question is, is it possible to literally error out of a handler?
As it turns out, there is a simple way to exit after a 400 (or any custom error). As described by TheFluff in the AppEngine IRC channel, a simple, old-fashioned empty return after the self.error(400) will do the trick.
In webapp2 abort() is a shortcut to raise a HTTP exception: http://webapp-improved.appspot.com/guide/exceptions.html#abort

Why are these deferred tasks not being executed in the order in which they were added?

I'm using Twilio to send sms's with appengine. Twilio doesn't accept sms's longer than 160 characters so I have to split them. I am splitting the sms's and sending them as follows:
def send_sms_via_twilio(mobile_number, message_text):
client = TwilioRestClient(twilio_account_sid , twilio_auth_token)
message = client.sms.messages.create(to=mobile_number, from_=my_twilio_number, body=message_text)
split_list = split_sms(long_message)
for each_message in split_list:
send_sms_via_twilio(each_message)
However I found that the order of sending varied. For example sometimes I'd recieve message 2/5 then 1/5 then 4/5 etc and other times the order would be correct. The order of the split_list is definately correct. To overcome the incorrect order of the sms's I tried
for each_message in split_list:
deferred.defer(send_sms_via_twilio, each_message, _countdown=1)
However I encountered the same problem. I then tried
for each_message in split_list:
deferred.defer(send_sms_via_twilio, each_message, _countdown=1, _queue="send-text-message")
and defined my queue as
- name: send-text-message
rate: 1/s
bucket_size: 10
max_concurrent_requests: 1
retry_parameters:
task_retry_limit: 5
Thinking that the issue was concurrency (running in python27) and that if I limited max_concurrent_requests this issue would be solved. However the issue is still present i.e. the texts still get sent in the wrong order. I checked the logs but couldnt see any notification of task failure - they just seem to be executing in the wrong order.
Is there something I am missing? How can I fix this issue.
Note that the SMS messaging (specifically the underlying protocols like SMPP) are asynchronous by definition. It means there is no way you can specify the order of distinct SMS messages.
There is a way to specify the order of SMS packets by using the UDH (user defined headers) in the binary body of those messages. But this works only for long SMS messages -- those that are too long to be sent in one message. For example, if your msg exceeds 160 GSM-7 characters or 80 UTF-16 characters it will be send as more than one message with UDH.
In that case the mobile phone won't show message parts as they arrive. It will collect them in memory until the last one comes and then assembles them in the right order. For the end user this is just a message longer than usual and you don't have to write "1/3", "2/3", ... in the message.
Disclaimer: I work for a company that enables you to send and receive both multiple binary messages with user-specified headers (UDH) and/or standard long messages.
If you are not tied to Twilio try using SMSified. They automatically split the message for you, insure it is in the correct order, and add "1/2, 2/2..." to the end of the message. In other words you just send the complete message to their REST API, no matter the length, and they handle the rest. Since they also use a REST API you can continue to use Python.

What happens when an async put, results in a contention exception, after the request has ended, on Appengine with NDB?

Using ndb, lets say I put_async'd 40 elements, with #ndb.toplevel, wrote an output to user and ended the request, however one of those put_async's resulted in a contention exception, would the response be 500 or 200? Or lets say If it it a task, would the task get re-executed?
One solution is get_result()'ing all those 40 requests before the request ending and catching those exceptions -if they occur- but I'm not sure whether it will affect performance.
As far as I understand, using #ndb.toplevel causes the handler wait for all async operations to finish before exiting.
From the docs:
As a convenience, you can decorate the request handler with #ndb.toplevel. This tells the handler not to exit until its asynchronous requests have finished. This in turn lets you send off the request and not worry about the result. https://developers.google.com/appengine/docs/python/ndb/async#intro
So by adding #ndb.toplevel that the response doesn't actually get returned until after the async methods have finished executing. Using #ndb.toplevel removes the need to call get_result on all the async calls that were fired off (for convenience). So based on this, the request would still return 500 if the async queries failed, because all the async queries needed to complete before returning. Updated: below
If using a task (I assume you mean task queue) the task queue will retry the request if the request fails.
So your handler could be something like:
def get(self):
deferred.defer(execute_stuff_in_background, param,param1)
template.render(...)
and execute_stuff_in_background would do all the expensive puts once the handler had returned. If there was a contention issue in the task, your original handler would still return 200.
If you suspect there is going to be a contention issue, perhaps consider sharding or using a fork-join queue implementation to handle the writes (see implementation here: http://www.youtube.com/watch?v=zSDC_TU7rtc#t=41m35)
Edit: Short answer
The request will fail (return 500) if the async requests fail, because #ndb.toplevel waits
for all results to finish before exiting.
Updated:Having looked at #alexis's answer below, I re-ran my original test (where I turned off datastore writes and called put_async in the handler decorated with #ndb.toplevel), the response raises 500 intermittently (I assume this depends on execution time). Based on this and #alexis's answer below, don't expect the result to be 500 if an async task throws an exception and the calling function is decorated with #ndb.toplevel
That's odd, I use toplevel and expect the opposite behavior. And that's what I observe. Did something change since the first answer to this question?
As the doc says:
This in turn lets you send off the request and not worry about the
result.
You can try the following unittest (using testbed):
#ndb.tasklet
def raiseSomething():
yield ndb.Key('foo','bar').get_async()
raise Exception()
#ndb.toplevel
def callRaiseSomething():
future = raiseSomething()
return "hello"
response = callRaiseSomething()
self.assertEqual(response, "hello")
This test passes. NDB logs a warning: "suspended generator raiseSomething(tests.py:90) raised Exception()", but it does not re-raise the exception.
ndb.toplevel only waits for the RPCs, but does nothing of the actual result.
If your decorated function is itself a tasklet, it will call get_result() on it first. At this point exceptions will be raised. Then it will wait for remaining 'orphaned' RPCs, and will only log something if an exception is raised.
So my response is: the request will succeed (return 200)

Resources