are updated health checks causing App Engine deployment to fail? - google-app-engine

we updated our google app engine health checks from the legacy version to the new version using and now our deployments are failing. Nothing else on the project has changed. We tested the default settings and then extended checks just in case.
This is the error:
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
This is our app.yaml:
liveness_check:
check_interval_sec: 120
timeout_sec: 40
failure_threshold: 5
success_threshold: 5
initial_delay_sec: 500
readiness_check:
check_interval_sec: 120
timeout_sec: 40
failure_threshold: 5
success_threshold: 5
app_start_timeout_sec: 1500
Unfortunately, no matter the configuration, both the readiness and liveness checks are throwing 404s.
What could be causing the problem? and how can we debug this?
Is it possible to rollback to the legacy health checks?

This is usually caused when the application is still reading from the legacy health check flags and/or deploying the app using gcloud app deploy without enabling the updated health checks first. You can check this by:
1- Making sure the legacy health_check flag does not exist on your app.yaml.
2- Run gcloud beta app describe to see whether splitHealthChecks flag is set to true under featureSettings.
By default, HTTP requests from updated health checks are not forwarded to your application container. If you want to extend health checks to your application, then specify a path for liveness checks or readiness checks.
You can then enable updated health checks by using gcloud beta app update --split-health-checks --project [your-project-id]. See this public issue tracker or this article about Updated Health Checks about for more details.

In my case, I solved this issue by manually increasing memory allocation?
resources:
cpu: 1
memory_gb: 2
disk_size_gb: 10
Found this solution in a google forum:
https://groups.google.com/forum/#!topic/google-appengine/Po_-SkC5DOE

For those of you who want to migrate to the default settings for splitted health checks, follow these steps:
1) Remove health_check, liveness_check and readiness_check sections from your app.yaml file
2) Deploy to a newer version, This is important. So, for example, if your current version is production, change it to something else like prod in the command gcloud app deploy --version [new-version-name]

Related

gcloud app deploy does not terminate even when service is running

I am deploying a node.js server to Google App Engine from Bitbucket pipeline environment and the last command in the script is: gcloud -q app deploy app.yaml --no-promote --verbosity=debug
The logs show that the service is deployed successfully but the script is not terminating, this is the last part of the log:
> DEBUG: Reading GCS logfile: 206 (read 10 bytes) PUSH DONE DEBUG:
> Operation [...] complete. Result: {...} DEBUG: Reading GCS logfile:
> 416 (no new content; keep polling)
> -------------------------------------------------------------------------------- DEBUG: Converted YAML to JSON: "{...}" DEBUG: Operation [...] not
> complete. Waiting to retry. Updating service [default] (this may take
> several minutes)... .DEBUG: Operation [...] not complete. Waiting to
> retry. ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
I tried to add readiness_check and liveness_check to app.yml but it didn't change the behaviour.
readiness_check:
path: "/api/public/logout"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
liveness_check:
path: "/api/public/logout"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
The main unknown here is what criteria does gcloud app deploy uses to determine termination condition?
Also, is there any bypass to this problem?
Update
The problem happens also when running the gcloud app deploy command from local environment (my laptop).
The problem does NOT happen when removing the --no-promote flag.
The gcloud app deploy command expects a well-formed and valid app.yml file, this is what determines its termination condition.
As you confirmed the deployment worked without the --no-promote flag, it could mean that something in the configuration expects the application to be already deployed and running, thus preventing the script to complete.
Another possible cause would be that the Google Cloud SDK version specified in bitbucket-pipelines.yml is an older one. Make sure you work with the latest. This consideration applies extensively to all dependencies in package.json, which might be conflicting with one another, especially when using older versions of Node.js.
This guide can help at building a sound configuration for Bitbucket-based deployments; although the example given is with Python, it might as well be used as a template for processing a Node.js pipeline.
Nb. in this solution, the Google Cloud SDK version is an older one (127.0.0), which will make this deployment fail, so it should be replaced with the latest (228.0.0 or higher). Also the guide omits another required API activation: Cloud Build API. I've notified the team to amend the solution.
I've tested several scenarios with a simple Node.js server, and could not reproduce the issue. Check my Github repository for the code.
For further help on this topic, please provide more hints, such as the content of the app.yml, bitbucket-pipelines.yml, and package.json files, as well as a description of the state of App Engine (services, versions).
In order to deploy the test repository to App Engine from Bitbucket, make sure the following is done on the project:
Enable API's:
App Engine Admin
Cloud Build
Create a Service Account with following permissions, and generate an API Key:
App Engine: Admin
Cloud Build: Editor
Storage: Object Admin

Error deploying java google app engine flexible application - Timed out waiting for the app infrastructure to become healthy

Writing this issue as I have no idea how to investigate it.
We're having problems in deploying an app engine flexible application.
The problem is, that the only error we get is the following:
GCLOUD: ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the app infrastructure to become healthy.
I tried already the following:
Try a simple helloWorld app, to make sure it's not an application issue
Check quota settings -> All green
Check activity stream for warnings or errors
Check logs for warngings or errors
Grant owner role to service account which is deploying the app
App.yaml:
service: test-service # Id of the service
env: flex # Flex environment
runtime: java # Java runtime
runtime_config:
jdk: openjdk8 # use OpenJDK 8
resources:
cpu: 1
memory_gb: 2.8
gcloud version
Google Cloud SDK 214.0.0 alpha 2018.08.24
app-engine-java 1.9.64
app-engine-python 1.9.74 beta 2018.08.24 bq 2.0.34
cloud-datastore-emulator 2.0.2
core 2018.08.24
gsutil 4.33
kubectl 2018.08.24
pubsub-emulator 2018.08.24
After contacting the google technical support, we found out, that the default app engine service account didn't have the Editor role. After assigning the editor role the deployment worked again.
This error is often reported when your application has reached the quota limit for "In-use IP addresses". Similar error was reported on this Google Cloud Platform issue link. The default value for the in-use addresses is '8', and this quota value can be increased clicking the 'Edit' button in the Cloud Console — Ensure you are editing the value for In-use IP addresses.
The Google engineer confirmed that there is a planned improvement to the quota error details to be implemented in one of the next versions of gcloud SDK. You can track updates on the CloudSDK within this Google Group link

How to deploy a GAE project in flexible environment without billing?

I've been developing some REST service using Flask and other third party libraries and I want to deploy it to GAE in the flexible environment. I usually deploy to the GAE standard environment but I wanted to try the new flexible environment. At the moment I wish to deploy to flexible environment without enabling billing, and the Google support assured me that it was possible to deploy over GAE flexible environment without enabling billing.
Running my code locally works fine, and have the following yaml file:
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT whereismybus230.starter:app
runtime_config:
python_version: 3
So I created a new project on through the Google cloud console web page (as usual), and created a new gcloud profile on my local machine so I deploy it to this new project.
Then I run:
gcloud app deploy --verbosity=info
I get that a docker image is being build and at some point it will be pushed to a Compute Engine but it fails after a few minutes here:
Successfully built sophiabus230 aniso8601 future docopt itsdangerous MarkupSafe
Installing collected packages: Werkzeug, click, MarkupSafe, Jinja2, itsdangerous, Flask, jsonschema, pytz, six, python-dateutil, aniso8601, flask-restplus, beautifulsoup4, future, sophiabus230, coverage, requests, docopt, coveralls
Successfully installed Flask-0.12 Jinja2-2.9.4 MarkupSafe-0.23 Werkzeug-0.11.15 aniso8601-1.2.0 beautifulsoup4-4.5.3 click-6.7 coverage-4.3.4 coveralls-1.1 docopt-0.6.2 flask-restplus-0.9.2 future-0.16.0 itsdangerous-0.24 jsonschema-2.5.1 python-dateutil-2.6.0 pytz-2016.10 requests-2.12.5 six-1.10.0 sophiabus230-0.4
---> 3e3438680079
Removing intermediate container bd9f8ccb6f4a
Step 8 : ADD . /app/
---> bde0915f6720
Removing intermediate container e3193eb4ef70
Step 9 : CMD gunicorn -b :$PORT whereismybus230.starter:app
---> Running in 022d38d769f8
---> 36893d0a549a
Removing intermediate container 022d38d769f8
Successfully built 36893d0a549a
PUSH
The push refers to a repository [us.gcr.io/whereismy230/appengine/default.20170120t131841]
e5f488ee94c5: Preparing
8d27ce27f03c: Preparing
3d5800d45c36: Preparing
06ba8a2a8ec3: Preparing
c0fb81dae3c6: Preparing
2e4eabdbeed3: Preparing
b5d474284f52: Preparing
c307273999be: Preparing
d73750730c30: Preparing
63bbaf04cf0b: Preparing
badb9b2d625b: Preparing
40c928fd4dcc: Preparing
dfcf8dbe47e1: Preparing
6d820e13990c: Preparing
2e4eabdbeed3: Waiting
b5d474284f52: Waiting
c307273999be: Waiting
d73750730c30: Waiting
63bbaf04cf0b: Waiting
badb9b2d625b: Waiting
40c928fd4dcc: Waiting
dfcf8dbe47e1: Waiting
6d820e13990c: Waiting
denied: Unable to create the repository, please check that you have access to do so.
The push refers to a repository [us.gcr.io/whereismy230/appengine/default.20170120t131841]
...
ERROR: (gcloud.app.deploy) Error Response: [2] Build failed; check build logs for details
Using the IAM service, I made sure my account was the owner of the project, and even checked all permissions.
Since the flexible environment relies on the Compute Engines (VMs), I tried to check from the web page and it's telling me that I need to enable billing to be able to use this functionality.
Am I doing something wrong ?
Thanks !
From App Engine Pricing:
Instances within the standard environment have access to a daily
limit of resource usage that is provided at no charge defined by a set
of quotas. Beyond that level, applications will incur charges as
outlined below. To control your application costs, you can set a
spending limit. To estimate costs for the standard environment,
use the pricing calculator.
Go to the pricing calculator
For instances within the flexible environment, services and APIs are
priced as described below.
And from Flexible environment instances:
Applications running in the App Engine flexible environment are
deployed to virtual machine types that you specify. This table
summarizes the hourly billing rates of the various computing
resources:
US
Resource Unit Unit cost
vCPU per core hour $0.0526
Memory per GB hour $0.0071
Persistent disk per GB per month $0.0400
Unlike the standard env, the flex env has no free quota. Which is inline with your observation that the developer console requires billing to be enabled to run GAE flex instances.
Without billing enabled you might be able to deploy your app (but without actually launching a GAE instance for it, so unsure of its usefulness, since you want to try it) by using the --no-promote option:
--promote
Promote the deployed version to receive all traffic.
True by default. To change the default behavior for your current
environment, run:
$ gcloud config set app/promote_by_default false
Overrides the default promote_by_default property value for this
command invocation. Use --no-promote to disable.
Side note: when you encounter problems you may also want to use --verbosity=debug to potentially get more relevant info about the failures.

GAE Managed VMs - can't deploy if your project name is too long

Currently the GAE Managed VMs feature is broken for any project with a name longer than 27 characters.
The underlying issue is that Docker restricts image namespace to between 4-30 chars. This has been fixed (https://github.com/docker/docker/issues/10392) but is still awaiting a release at time of writing.
It seems when deploying a Managed VM to GAE that the namespace is automatically generated from your project name plus _m_ prefix. This leads to error when attempting to deploy the vm:
DEBUG: "POST /v1.10/images/gcr.io/_m_<my project name>/<my project name>.default.20150330t140211/push HTTP/1.1" 500 111
INFO: Exception 500 Server Error: Internal Server Error ("Invalid namespace name (_m_<my project name>). Cannot be fewer than 4 or more than 30 characters.") thrown in ProgressHandler. Retrying.
The obvious solution would be for GAE gcloud tools to respect the underlying limit via some auto-truncation or hashing scheme.
Does anyone know a way around this? Or I have to wait for Google to fix or Docker to release a new version and Google to update?
We're aware of the issue and we're working on a long-term fix. For now, you can switch to an old version of gcloud. You can do this by setting this variable to point to an old version (0.9.51):
gcloud config set --scope=installation component_manager/fixed_sdk_version 0.9.51
then run "gcloud components update"
Then run "gcloud config set app/hosted_registry false"
and you should be able to deploy. I'll update this answer when we've fixed the naming issue.
UPDATE:
The naming issue has been fixed as of this week's release (0.9.57).

How to disable health checking for `gcloud preview app run`

Is there a way to disable health checking when running the development server for manages VMs locally (gcloud preview app run app.yaml)?
This health checking causes me headaches during debugging.
I tried to add health_check settings to app.yaml like shown in https://cloud.google.com/appengine/docs/go/managed-vms/ :
health_check:
enable_health_check: False
and tried different values for
check_interval: 5
timeout: 4
unhealthy_threshold: 2
healthy_threshold: 2
restart_threshold: 60
but none of these changes did work.
enable_health_check: False seems to be ignored and so are most of the other settings (some cause an error) see https://code.google.com/p/googleappengine/issues/detail?id=11491
From a comment from the issue you provided:
There's also an existing bug about the dev server (gcloud preview app
run) not respecting the health_check setting. It's still using the old
and deprecated 'vm_health_check'. To get your settings to take effect
in the dev server you'll need to use vm_health_check for now.
So just use for now:
# health_check: # not yet supported, use instead
vm_health_check:
enable_health_check: False
or change one of the following settings
# check_interval: # this is an error in the documentation, use instead
check_interval_sec: 5
# timeout: 4 # didn't work with vm_health_check
unhealthy_threshold: 2
healthy_threshold: 2
restart_threshold: 60

Resources