I've recently started having coldstarts on nearly every call to my appengine app. Initially I thought this was an issue with Cloud Endpoints, however now I believe it is an appengine issue, or something else in my code.
This started on my most recent deployment. I'm at a loss right now as to what is going on. It has made my app useless. I have tried 1.7.4 and 1.7.5 and both have this problem.
The requests work other than being extremely slow! Any help would be greatly appreciated, as I can not continue with 10-15 request times!
Update: By looking at my running instances I see NO instances running even after making a request. Previously instances would remain running after requests were made. It appears when a request is made an instance is spun up, serves the request, and then dies. There are no errors in my logs. No changes have been made to my app settings or billing. This app does have billing enabled.
Update 2: I have adjusted my idle instance settings(which up to this point have worked and have been left untouched). I set to a min of 1 and max of 2. The instances are staying alive and serving requests as normal. Previously it was set to automatic-1. Not sure what is going on here, perhaps Google adjusting the request scheduler or something. Not COOL!
I have found an open issue on code.google.com. Apparently this is a real appengine issue that is effecting some, if not all apps at the moment. I do not have a solution at this time other than setting the idle minimum instances to 1 (or greater). Even doing this, the new dynamic instances that are spun up die almost immediately after serving a request. Just waiting on a fix from google at this point.
http://code.google.com/p/googleappengine/issues/detail?id=8844
Related
Facing the below issue continuously with one POST request which is triggered from Google Cloud Task. The node application is deployed on Google App Engine. tried even increasing the instances but no luck.
Process terminated because the request deadline was exceeded. Please ensure that your HTTP server is listening for requests on 0.0.0.0 and on the port defined by the PORT environment variable. (Error code 123)
Any help would be useful. Thanks in advance
I have run into this issue in the past. The question, as posted, doesn't have enough information to narrow down the issue, but here are a few things you can try:
Increase memory size: It's possible that your instances are running out of memory and failing to start. Try increasing memory size of the instance to see if that helps.
Billing: Make sure all of your billing information is up to date and the issue isn't related to cost.
Warmups: Do you have a warmup, liveness check, or readiness check turned on? If so, make sure they are working properly.
Add Logging: The request has no logs displayed. You should try adding logging.info(...) statements in your code to see how far it gets before failing.
I have an GAE app PHP72, env: standard which is hanging intermittently (once or twice a day for about 5 mins).
When this occurs I see a large spike in GAE dashboard's Traffic Sent graph.
I've reviewed all uses of file_get_contents and curl_exec within the app's scripts, not including those in /vendor/, and don't believe these to be the cause.
Is there a simple way in which I can review more info on these outbound requests?
There is no way to get more details in that dashboard. You're going to need to check your logs at the corresponding times. Obscure things to check for:
Cron jobs coming in at the same times
Task Queues spinning up
I have an application which uses app engine auto scaling. It usually runs 0 instances, except if some authorised users use it.
This application need to run automated voice calls as fast as possible on thousands of people with keypad interactions (no, it's not spam, it's redcall!).
Programmatically speaking, we ask Twilio to initialise calls through its Voice API 5 times/sec and it basically works through webhooks, at least 2, but most of the time 4 hits per call. So GAE need to scale up very quickly and some requests get lost (which is just a hang up on the user side) at the beginning of the trigger, when only one instance is ready.
I would like to know if it is possible to programmatically scale up App Engine (through an API?) before running such triggers in order to be ready when the storm will blast?
I think you may want to give warmup requests a try. As they load your app's code into a new instance before any live requests reach that instance. Thus reducing the time it takes to answer while you GAE instance has scaled down to zero.
The link I have shared with you, includes the PHP7 runtime as I see you are familiar with it.
I would also like to agree with John Hanley, since finding a sweet spot on how many idle instances you have available, would also help the performance of your app.
Finally, the solution was to delegate sending the communication through Cloud Tasks:
https://cloud.google.com/tasks/docs/creating-appengine-tasks
https://github.com/redcall-io/app/blob/master/symfony/src/Communication/Processor/QueueProcessor.php
Tasks can try again hitting the app engine in case of errors, and make the app engine pop new instances when the surge comes.
I'm trying to add timeouts to GWT sessions, by using the following code to check if a session is alive:
public boolean isSessionAlive() {
return System.currentTimeMillis() - getThreadLocalRequest().getSession()
.getLastAccessedTime() < timeout;
}
I based this code on many examples I saw on web for GWT sessions, such as this.
The above code works great while running on a local web server, but after deploying the project to App Engine it doesn't. The following always returns 0 on App Engine:
getThreadLocalRequest().getSession().getLastAccessedTime()
As far as I understand, the last accessed time is updated on each RPC call.
I made several calls, but this value still remains zero and incorrect result is returned.
Does anybody know how to fix this issue?
Things will change after deployed on GAE
Just today attended the session on app engine by #roman irani .
remember that App Engine is a distributed architecture so a difference from Java EE is that you are never guaranteed the same application server instance during request processing as the previous request. While the object is being serialized correctly in memcache, you still have to call setAttribute() every time due to the fact that memory is not shared.
Clear cut picture here to handle the session
I have found a workaround. Adding the following code in war/WEB-INF/web.xml will cause the session to expire after 30 minutes:
<!-- timeout in minutes -->
<session-config>
<session-timeout>30</session-timeout>
</session-config>
Reference: Session Timeouts with GWT RPC calls.
I am unable to update my frontends nor my backends. I get the error message 'Version is not ready'. This bug has persisted for coming up to 24 hours now. I have a task perpetually running in a queue. My best guess is that this task is stopping the update. I am unable to delete the task as it is perpetually running, nor can I delete the queue as I am unable to upload a new queue.yaml definition. The same task previously failed due to a maximum recursion error as I had a synchronous RPC within an asynchronous tasklet.
I'm pretty sure the fix will require someone from the GAE side forcibly resetting the task queue. Thus, this question would be more suitably directed to the GAE team with details about my app in a less public forum. Though, from what I can see, they do not allow direct support questions and suggest posting the question here. My follow up question, then, is when you have a GAE issue that requires action from the GAE team - how do you get hold of them (other than paying US$500/month for a premium support account)?
EDIT:
The task is/was meant to be running on a backend instance. I intended to shutdown all backend and frontend instances via the console assuming that they would cancel the task and restart themselves. But I found that only one frontend instance was running - no backends. After shutting down that frontend instance, the dashboard has reported that I have 0 instances running, yet the website is still serving and the task remains perpetually running.
EDIT:
Disabling the app stopped the task from running. After reenabling the app, I was able to update it. Though I am left with a ghost task in my queue.
If you have a stuck task queue job, I'd try disabling the queue and killing the instance running that job. If that doesn't work, I'd try disabling the app temporarily.