'Version is not ready' error on update - GAE Python - google-app-engine

I am unable to update my frontends nor my backends. I get the error message 'Version is not ready'. This bug has persisted for coming up to 24 hours now. I have a task perpetually running in a queue. My best guess is that this task is stopping the update. I am unable to delete the task as it is perpetually running, nor can I delete the queue as I am unable to upload a new queue.yaml definition. The same task previously failed due to a maximum recursion error as I had a synchronous RPC within an asynchronous tasklet.
I'm pretty sure the fix will require someone from the GAE side forcibly resetting the task queue. Thus, this question would be more suitably directed to the GAE team with details about my app in a less public forum. Though, from what I can see, they do not allow direct support questions and suggest posting the question here. My follow up question, then, is when you have a GAE issue that requires action from the GAE team - how do you get hold of them (other than paying US$500/month for a premium support account)?
EDIT:
The task is/was meant to be running on a backend instance. I intended to shutdown all backend and frontend instances via the console assuming that they would cancel the task and restart themselves. But I found that only one frontend instance was running - no backends. After shutting down that frontend instance, the dashboard has reported that I have 0 instances running, yet the website is still serving and the task remains perpetually running.
EDIT:
Disabling the app stopped the task from running. After reenabling the app, I was able to update it. Though I am left with a ghost task in my queue.

If you have a stuck task queue job, I'd try disabling the queue and killing the instance running that job. If that doesn't work, I'd try disabling the app temporarily.

Related

Cloud Tasks client ignores retry configuration

Basically what the title says. The API and client docs state that a retry can be passed to create_task:
retry (Optional[google.api_core.retry.Retry]): A retry object used
to retry requests. If ``None`` is specified, requests will
be retried using a default configuration.
But this simply doesn't work. Passing a Retry instance does nothing and the queue-level settings are still used. For example:
from google.api_core.retry import Retry
from google.cloud.tasks_v2 import CloudTasksClient
client = CloudTasksClient()
retry = Retry(predicate=lambda _: False)
client.create_task('/foo', retry=retry)
This should create a task that is not retry. I've tried all sorts of different configurations and every time it just uses whatever settings are set on the queue.
You can pass a custom predicate to retry on different exceptions. There is no formal indication that this parameter prevents retrying. You may check the Retry page for details.
Google Cloud Support has confirmed that task-level retries are not currently supported. The documentation for this client library is incorrect. A feature request exists here https://issuetracker.google.com/issues/141314105.
Task-level retry parameters are available in the Google App Engine bundled service for task queuing, Task Queues. If your app is on GAE, which I'm guessing it is since your question is tagged with google-app-engine, you could switch from Cloud Tasks to GAE Task Queues.
Of course, if your app relies on something that is exclusive to Cloud Tasks like the beta HTTP endpoints, the bundled service won't work (see the list of new features, and don't worry about the "List Queues command" since you can always see that in the configuration you would use in the bundled service). Barring that, here are some things to consider before switching to Task Queues.
Considerations
Supplier preference - Google seems to be preferring Cloud Tasks. From the push queues migration guide intro: "Cloud Tasks is now the preferred way of working with App Engine push queues"
Lock in - even if your app is on GAE, moving your queue solution to the GAE bundled one increases your "lock in" to GAE hosting (i.e. it makes it even harder for you to leave GAE if you ever want to change where you run your app, because you'll lose your task queue solution and have to deal with that in addition to dealing with new hosting)
Queues by retry - the GAE Task Queues to Cloud Tasks migration guide section Retrying failed tasks suggests creating a dedicated queue for each set of retry parameters, and then enqueuing tasks accordingly. This might be a suitable way to continue using Cloud Tasks

GAE, This is likely to cause a new process to be used for the next request to your application

I use GAE standard environment and cron service. I deploy my app and cron jobs successfully based on app.yaml and cron.yaml.
And, when I run cron jobs firstly. I got below logs from stackdriver:
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
A problem was encountered with the process that handled this request, causing it to exit. This is likely to cause a new process to be used for the next request to your application.
How do I solve this? thanks.

How can Google App Engine be prevented from immediately rescheduling tasks after status code 500?

I have a Google App Engine servlet that is cron configured to run once a week. Since it will take more than 1 minute of execution time, it launches a task (i.e. another servlet task/clear) on the application's default push task queue.
Now what I'm observing is this: if the task causes an exception (e.g. NullPointerException inside its second servlet), this gets translated into HTTP status 500 (i.e. HttpURLConnection.HTTP_INTERNAL_ERROR) and Google App Engine apparently reacts by immediately relaunching the same task again. It announces this by printing:
Web hook at http://127.0.0.1:8888/task/clear returned status code 500. Rescheduling..
I can see how this can sometimes be a feature, but in my scenario it's inappropriate. Can I request that Google App Engine should not do such automatic rescheduling, or am I expected to use other status codes to indicate error conditions that would not cause rescheduling by its rules? Or is this something that happens only on the dev. server?
BTW, I am currently also running other tasks (with different frequencies) on the same task queue, so throttling reschedules on the level of task queue configuration would be inconvenient (so I hope there is another/better option too.)
As per https://developers.google.com/appengine/docs/java/taskqueue/overview-push#Java_Task_execution - the task must return a response code between 200 and 299.
You can either return the correct value, set the taskRetryLimit in RetryOptions or check the header X-AppEngine-TaskExecutionCount when task launches to check how many times it has been launched and act accordingly.
I think I've found a solution: in the Java API, there is a method RetryOptions#taskRetryLimit, which serves my case.

Appengine Cold start on every request call (Java)

I've recently started having coldstarts on nearly every call to my appengine app. Initially I thought this was an issue with Cloud Endpoints, however now I believe it is an appengine issue, or something else in my code.
This started on my most recent deployment. I'm at a loss right now as to what is going on. It has made my app useless. I have tried 1.7.4 and 1.7.5 and both have this problem.
The requests work other than being extremely slow! Any help would be greatly appreciated, as I can not continue with 10-15 request times!
Update: By looking at my running instances I see NO instances running even after making a request. Previously instances would remain running after requests were made. It appears when a request is made an instance is spun up, serves the request, and then dies. There are no errors in my logs. No changes have been made to my app settings or billing. This app does have billing enabled.
Update 2: I have adjusted my idle instance settings(which up to this point have worked and have been left untouched). I set to a min of 1 and max of 2. The instances are staying alive and serving requests as normal. Previously it was set to automatic-1. Not sure what is going on here, perhaps Google adjusting the request scheduler or something. Not COOL!
I have found an open issue on code.google.com. Apparently this is a real appengine issue that is effecting some, if not all apps at the moment. I do not have a solution at this time other than setting the idle minimum instances to 1 (or greater). Even doing this, the new dynamic instances that are spun up die almost immediately after serving a request. Just waiting on a fix from google at this point.
http://code.google.com/p/googleappengine/issues/detail?id=8844

How to use pull queue on dev server of Google App Engine

We have been using push queue for a very long time and have no problems in consuming the tasks from a dev server.
However during implementing a new service with pull queue, it became difficult to figure out how to do the same thing on the dev server.
Basically from the docs, what we can see is that you should use a REST api (we can't use the direct queue api as it is consumed by an external app) to lease/delete a task with the end point of
https://www.googleapis.com/taskqueue/v1beta1/projects/taskqueues
But obviously this will not work in local dev server, and it appears that no place have talking about this.
Just wondering if anyone had ever run into the same issue had can shed some light?
With Pull Queue, task consumer can be internal or external.
If you need it to work on dev server, then just create a handler (a servlet) and use internal API to add, lease and delete tasks.

Resources