Google AppEngine automated shutdown script execution - google-app-engine

All of a sudden google app engine started running shutdown script..there are no errors reports anywhere.. has anyone experienced this behavior?

Machines in flex can get shut down for a few reasons:
The instance went unhealthy, stopped responding to traffic
Traffic slowed down, so the instance got auto-scaled down for you
It was time for a weekly reboot so we could pick up security patches
We're rolling out a change in the next few days that makes figuring this stuff out much easier :D

Related

Frequent restarts on GAE application flex environment

I have a GAE application that's set up as a flexible instance, which is expected to be restarted on a weekly basis (and a continually unhealthy instance can be restarted): https://cloud.google.com/appengine/docs/flexible/java/how-instances-are-managed
However, we're seeing this restart ("npm run build" command) several times per week! For example in the past three weeks we've had 9 restarts, and I've confirmed that the log entries leading up are successful 200 responses (no sign of trouble)- all for the active version serving traffic (and not for the other versions that are stopped).
Has anyone seen this symptom before or know of something else that can cause frequent restarts?
Let me know if any other info would be helpful.
An instance restart in the Google App Engine flexible environment can occur for several reasons:
According to the GAE documentation, there is no guarantee that an instance runs indefinitely, it can be restarted due to hardware maintenance, software updates or unforeseen issues. Besides that, as you stated, all instances are restarted on a weekly basis.
An instance can also be restarted if it fails to respond to a specified number of consecutive health check requests.
In case that you observe a unusual number of restarts I recommend you to open a ticket in Google Cloud Platform Support. They have internal tools that are able to check what is going on in the instance and figure out why the restarts are happening.
#DianeKaplan's comment:
Contacting GCP support has given me some a few helpful nuggets so far:
The automatic weekly restart of an instance due to maintenance can occur around different times (so it may only be 5 days since the last one, for example)
our deployments (which result in new GAE versions) make Google Builds
In some cases, a VM was being created overnight and then immediately deleted, where it didn't look like autoscaling was needed. Still looking into this, but was pointed towards the Google Cloud Console section Home > Activity as a good place to find clues

App Engine TaskQueue: Interrupted and 20 Minutes to Restart

It seems that when app engine taskqueue's get interrupted, they take 20 minutes or more to restart, is this behavior normal?
I am using the TaskQueue on Google Cloud's App Engine Flexible system. I regularly add tasks to the taskqueue and they get processed on the system. It appears that occasionally, the task gets interrupted in the middle of what it's doing. I don't know why this happens, but I assume it's probably because the instance that its on restarted itself.
My software is resilient to such restarts, but the problem is that it takes a full 20 minutes for the task to be restarted. Has anyone experienced this before?
I think you're right, an instance grabs the task and then goes down. Taskqueue doesn't realize it and waits for some kind of timeout.
This sounds very similar to an issue i experienced:
app engine instance dies instantly, locking up deferred tasks until they hit 10 minute timeout
So to answer your question, I would say yes this does happen. As for what to do, I guess it depends on what it is this task is doing, how often it runs, etc. If the 20 minute lag isnt a big deal I would just live with it, just because fixing it can be a bit of a wild goose chase, but here's what I would try:
When launching tasks, launch duplicates as well with a staggered value for countdown/eta
setup a separate microservice to handle/execute these tasks, hopefully this will make it's execution more predictable, you'll be able to tweak instance-size, & scaling settings to better suit it.

Migration from master/slave to HRD and email limited to 100/day

Almost a week ago I migrated my (paid) app from Master/Slave to HRD. Since that time my app has been restricted to 100 emails/day with a warning indicating "Resource is currently experiencing a short-term quota limit". I know it mentions a limitation until the first successful billing so I was hoping the limit would disappear once that happened - but alas it has not! I have also filled out the "request additional resources" form hoping that might help.
Anyone encountered this problem migrating from master/slave? Any suggestions of who I can contact or how I can recover from this limit? The migration process was relatively smooth - except for this problem which has become a significant impact to my customers.
I am having the same problem right now. I just finished my HRD migration yesterday and only today realized that my Mail API requests are failing because the mail quota on my new HRD app is only at 100 messages per day. I did not have this limitation before, and I find it pretty disappointing that such a triviality is causing my users trouble despite a successful migration. I submitted a quota increase request and hoping I don't also have to wait a week before it's applied.
Anyone who is waiting until the last minute to migrate their app to HRD be warned: make sure you apply for a mail quota increase on your new app because your old setting will not carry over, even if you have billing enabled on both apps.

GAE Queue Statistics numbers wrong on development console

I'm seeing very strange behavior in some code that checks the QueueStatistics for a queue to see if any tasks are currently running. To the best of my knowledge there are NO tasks running, and none have been queued up for the past 12+ hours. The development console corroborates this, saying that there are 0 tasks in the queue.
Looking at the QueueStatistics information in my debugger though, confirms that my process is exiting because it's seeing on the order of 500+ (!!!) tasks in the queue. It also says it ran >1000 tasks in the past minute, yet it ran 0 tasks in the past hour. If I parse through the ETA Usec, the time is "accurately" showing as if the ETA is within the next minute of when the QueueStatistics were pulled.
This is happening repeatedly whenever I re-run my servlet, and the first thing the servlet does is check the queue statistics. No other servlets, tasks, or cron jobs are running as this is my local development server. Yet the queue statistics continue to insist I've got hundreds of tasks running.
I couldn't find any other reports of this behavior, but it feels like I must be missing something major here in regards to Queue Statistics. The code I'm using is very simple:
Queue taskQueue = QueueFactory.getQueue("myQueue");
QueueStatistics stats = taskQueue.fetchStatistics();
if (stats.getNumTasks() > 0) { return; }
What am I missing? Are queue statistics entirely unreliable on the local dev server?
If it works as expected when deployed then that's the standard to go by.
Lots of things don't work as they do in the deployed environment (parallel threads are not parallel, backend support is somewhat broken for addressing them at the time of writing) so deploy deploy deploy!
Another example is the channel API. When used locally it uses polling, you'll see 100's of those if you look in the logs/browser debug. But when deployed all is well and it works as expected.

Apache server seems to fall asleep after not being used for a while

I'm having some issues with a web server. It is not used a lot yet (mainly by me now), but will be used as a live server hosting a couple hundred/thousand Wordpress sites soon.
I'm currently having the issue that when the webserver isn't used for a bit (some minutes) it seems to 'fall asleep'. From then the first request takes an awfull long time to process and after that it runs smoothly for a bit.
The server (VPS) it's on is dedicated to being a webserver, so apache (/mysql) should be top priority.
Does anybody know what I can do to improve this?
Thanks!
The issue finally solved itself when we started getting more visitors to the server, keeping it alive.

Resources