gcloud app deploy does not terminate even when service is running - google-app-engine

I am deploying a node.js server to Google App Engine from Bitbucket pipeline environment and the last command in the script is: gcloud -q app deploy app.yaml --no-promote --verbosity=debug
The logs show that the service is deployed successfully but the script is not terminating, this is the last part of the log:
> DEBUG: Reading GCS logfile: 206 (read 10 bytes) PUSH DONE DEBUG:
> Operation [...] complete. Result: {...} DEBUG: Reading GCS logfile:
> 416 (no new content; keep polling)
> -------------------------------------------------------------------------------- DEBUG: Converted YAML to JSON: "{...}" DEBUG: Operation [...] not
> complete. Waiting to retry. Updating service [default] (this may take
> several minutes)... .DEBUG: Operation [...] not complete. Waiting to
> retry. ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
I tried to add readiness_check and liveness_check to app.yml but it didn't change the behaviour.
readiness_check:
path: "/api/public/logout"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
liveness_check:
path: "/api/public/logout"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
The main unknown here is what criteria does gcloud app deploy uses to determine termination condition?
Also, is there any bypass to this problem?
Update
The problem happens also when running the gcloud app deploy command from local environment (my laptop).
The problem does NOT happen when removing the --no-promote flag.

The gcloud app deploy command expects a well-formed and valid app.yml file, this is what determines its termination condition.
As you confirmed the deployment worked without the --no-promote flag, it could mean that something in the configuration expects the application to be already deployed and running, thus preventing the script to complete.
Another possible cause would be that the Google Cloud SDK version specified in bitbucket-pipelines.yml is an older one. Make sure you work with the latest. This consideration applies extensively to all dependencies in package.json, which might be conflicting with one another, especially when using older versions of Node.js.
This guide can help at building a sound configuration for Bitbucket-based deployments; although the example given is with Python, it might as well be used as a template for processing a Node.js pipeline.
Nb. in this solution, the Google Cloud SDK version is an older one (127.0.0), which will make this deployment fail, so it should be replaced with the latest (228.0.0 or higher). Also the guide omits another required API activation: Cloud Build API. I've notified the team to amend the solution.
I've tested several scenarios with a simple Node.js server, and could not reproduce the issue. Check my Github repository for the code.
For further help on this topic, please provide more hints, such as the content of the app.yml, bitbucket-pipelines.yml, and package.json files, as well as a description of the state of App Engine (services, versions).
In order to deploy the test repository to App Engine from Bitbucket, make sure the following is done on the project:
Enable API's:
App Engine Admin
Cloud Build
Create a Service Account with following permissions, and generate an API Key:
App Engine: Admin
Cloud Build: Editor
Storage: Object Admin

Related

FastAPI on Google App Engine: Why am I getting duplicate cloud tasks?

I have deployed a FastAPI ML service to Google App Engine but it's exhibiting some odd behavior. The FastAPI service is intended to receive requests from a main service (via Cloud Tasks) and then send responses back. And that does happen. But it appears the route in the FastAPI service that handles these requests gets called four times instead of just once.
My assumption was that GAE, gunicorn, or FastAPI would ensure that the handler runs once per cloud task. But it appears that multiple workers, or some other issue in my config, is causing the handler to get called four times. Here are a few more details and some specific questions:
The Fast API app is deployed to Google App Engine (flex) via gcloud app deploy app.yaml
The app.yaml file includes GUNICORN_ARGS: "--graceful-timeout 3540 --timeout 3600 -k gevent -c gunicorn.gcloud.conf.py main:app"
The Dockerfile in the FastAPI project root (which is used for the gcloud deploy) also includes the final command gunicorn -c gunicorn.gcloud.conf.py main:app
Here's the gunicorn conf:
bind = ":" + os.environ["PORT"]
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
forwarded_allow_ips = "*"
max_requests = 1000
max_requests_jitter = 100
timeout = 200
graceful_timeout = 6000
So I'm confused:
Does GUNICORN_ARGS in app.yaml or the gunicorn argument in the Dockerfile take precedence?
Should I be using multiple workers or is that precisely what's causing multiple tasks?
Happy to provide any other relevant info.
GAE Flex defines environment variables in the app.yaml file [1].
Looking at Docker Compose "In the case of environment, labels, volumes, and devices, Compose “merges” entries together with locally-defined values taking precedence." [2], depending on if they are using a .env file "Values in the shell take precedence over those specified in the .env file." [3]
[1] https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml#defining_environment_variables
[2] https://docs.docker.com/compose/extends/
[3] https://docs.docker.com/compose/environment-variables/
The issue is unlikely to be a Cloud Task duplication issue "in production, more than 99.999% of tasks are executed only once." [4]. You can investigate the calling source
[4] https://cloud.google.com/tasks/docs/common-pitfalls#duplicate_execution
You can also investigate the log contents to see if there are unique identifiers, or if they are the same logs.
For the second question on uvicorn [0] workers, you can try hard coding the value of “workers” to 1 and verify if there is no repetition.
[0] https://www.uvicorn.org/

Why can't I override the timeout on my Google Cloud Build?

I am attempting to setup a CI Pipeline using Google Cloud Build.
I am attempting to deploy a MeteorJS app which has a lengthy build time - the default build timeout for GCB is 10 minutes and it was recommended here that I increase the timeout.
I have setup my cloudbuild.yaml file with the timeout option increased to 20 minutes:
steps:
- name: 'gcr.io/cloud-builders/gcloud'
args: ['app', 'deploy']
timeout: 1200s
I have a Trigger setup in GCB connected to a Bitbucket Repo and when I push a change and the Trigger fires, I get 2 new builds - one coming from Bitbucket and one whose source is Google Cloud Storage.
Once 10 minutes of build time has elapsed, the build from Cloud Storage will timeout which will cause the Bitbucket build to fail as well with Error Response: [4] DEADLINE_EXCEEDED
Occasionally, for whatever reason, the Cloud Storage build will finish in under 10 minutes which will allow the Bitbucket build to finish successfully and deploy.
If I attempt to cancel/stop the Cloud Storage build, it will also stop the Bitbucket build.
The screenshot below shows 2 attempts of the exact same build with differing results.
I do not understand where this second Cloud Storage Build is coming from, but it does not seem to be affected by the settings in my yaml file or my global GCP settings.
I have attempted to run the following commands from the gcloud CLI:
gcloud config set app/cloud_build_timeout 1200
gcloud config set builds/timeout 1200
gcloud config set container/build-timeout 1200
I have also attempted to use a high CPU build machine to speed up the process but it did not seem to have any effect.
Any insight would be greatly appreciated - I feel that I have exhausted every possible combination of Google Search keywords I can think up!
This timeout error comes from app engine deployment itself which has 10 min timeout by default.
You will need to update app/cloud_build_timeout property inside container itself like this:
steps:
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args: ['-c', 'gcloud config set app/cloud_build_timeout 1200 && gcloud app deploy']
timeout: 1200s
Update
Actually simpler solution:
steps:
- name: 'gcr.io/cloud-builders/gcloud'
args: ['app', 'deploy']
timeout: 1200s
timeout: 1200s

Error deploying java google app engine flexible application - Timed out waiting for the app infrastructure to become healthy

Writing this issue as I have no idea how to investigate it.
We're having problems in deploying an app engine flexible application.
The problem is, that the only error we get is the following:
GCLOUD: ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the app infrastructure to become healthy.
I tried already the following:
Try a simple helloWorld app, to make sure it's not an application issue
Check quota settings -> All green
Check activity stream for warnings or errors
Check logs for warngings or errors
Grant owner role to service account which is deploying the app
App.yaml:
service: test-service # Id of the service
env: flex # Flex environment
runtime: java # Java runtime
runtime_config:
jdk: openjdk8 # use OpenJDK 8
resources:
cpu: 1
memory_gb: 2.8
gcloud version
Google Cloud SDK 214.0.0 alpha 2018.08.24
app-engine-java 1.9.64
app-engine-python 1.9.74 beta 2018.08.24 bq 2.0.34
cloud-datastore-emulator 2.0.2
core 2018.08.24
gsutil 4.33
kubectl 2018.08.24
pubsub-emulator 2018.08.24
After contacting the google technical support, we found out, that the default app engine service account didn't have the Editor role. After assigning the editor role the deployment worked again.
This error is often reported when your application has reached the quota limit for "In-use IP addresses". Similar error was reported on this Google Cloud Platform issue link. The default value for the in-use addresses is '8', and this quota value can be increased clicking the 'Edit' button in the Cloud Console — Ensure you are editing the value for In-use IP addresses.
The Google engineer confirmed that there is a planned improvement to the quota error details to be implemented in one of the next versions of gcloud SDK. You can track updates on the CloudSDK within this Google Group link

are updated health checks causing App Engine deployment to fail?

we updated our google app engine health checks from the legacy version to the new version using and now our deployments are failing. Nothing else on the project has changed. We tested the default settings and then extended checks just in case.
This is the error:
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
This is our app.yaml:
liveness_check:
check_interval_sec: 120
timeout_sec: 40
failure_threshold: 5
success_threshold: 5
initial_delay_sec: 500
readiness_check:
check_interval_sec: 120
timeout_sec: 40
failure_threshold: 5
success_threshold: 5
app_start_timeout_sec: 1500
Unfortunately, no matter the configuration, both the readiness and liveness checks are throwing 404s.
What could be causing the problem? and how can we debug this?
Is it possible to rollback to the legacy health checks?
This is usually caused when the application is still reading from the legacy health check flags and/or deploying the app using gcloud app deploy without enabling the updated health checks first. You can check this by:
1- Making sure the legacy health_check flag does not exist on your app.yaml.
2- Run gcloud beta app describe to see whether splitHealthChecks flag is set to true under featureSettings.
By default, HTTP requests from updated health checks are not forwarded to your application container. If you want to extend health checks to your application, then specify a path for liveness checks or readiness checks.
You can then enable updated health checks by using gcloud beta app update --split-health-checks --project [your-project-id]. See this public issue tracker or this article about Updated Health Checks about for more details.
In my case, I solved this issue by manually increasing memory allocation?
resources:
cpu: 1
memory_gb: 2
disk_size_gb: 10
Found this solution in a google forum:
https://groups.google.com/forum/#!topic/google-appengine/Po_-SkC5DOE
For those of you who want to migrate to the default settings for splitted health checks, follow these steps:
1) Remove health_check, liveness_check and readiness_check sections from your app.yaml file
2) Deploy to a newer version, This is important. So, for example, if your current version is production, change it to something else like prod in the command gcloud app deploy --version [new-version-name]

How to deploy a GAE project in flexible environment without billing?

I've been developing some REST service using Flask and other third party libraries and I want to deploy it to GAE in the flexible environment. I usually deploy to the GAE standard environment but I wanted to try the new flexible environment. At the moment I wish to deploy to flexible environment without enabling billing, and the Google support assured me that it was possible to deploy over GAE flexible environment without enabling billing.
Running my code locally works fine, and have the following yaml file:
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT whereismybus230.starter:app
runtime_config:
python_version: 3
So I created a new project on through the Google cloud console web page (as usual), and created a new gcloud profile on my local machine so I deploy it to this new project.
Then I run:
gcloud app deploy --verbosity=info
I get that a docker image is being build and at some point it will be pushed to a Compute Engine but it fails after a few minutes here:
Successfully built sophiabus230 aniso8601 future docopt itsdangerous MarkupSafe
Installing collected packages: Werkzeug, click, MarkupSafe, Jinja2, itsdangerous, Flask, jsonschema, pytz, six, python-dateutil, aniso8601, flask-restplus, beautifulsoup4, future, sophiabus230, coverage, requests, docopt, coveralls
Successfully installed Flask-0.12 Jinja2-2.9.4 MarkupSafe-0.23 Werkzeug-0.11.15 aniso8601-1.2.0 beautifulsoup4-4.5.3 click-6.7 coverage-4.3.4 coveralls-1.1 docopt-0.6.2 flask-restplus-0.9.2 future-0.16.0 itsdangerous-0.24 jsonschema-2.5.1 python-dateutil-2.6.0 pytz-2016.10 requests-2.12.5 six-1.10.0 sophiabus230-0.4
---> 3e3438680079
Removing intermediate container bd9f8ccb6f4a
Step 8 : ADD . /app/
---> bde0915f6720
Removing intermediate container e3193eb4ef70
Step 9 : CMD gunicorn -b :$PORT whereismybus230.starter:app
---> Running in 022d38d769f8
---> 36893d0a549a
Removing intermediate container 022d38d769f8
Successfully built 36893d0a549a
PUSH
The push refers to a repository [us.gcr.io/whereismy230/appengine/default.20170120t131841]
e5f488ee94c5: Preparing
8d27ce27f03c: Preparing
3d5800d45c36: Preparing
06ba8a2a8ec3: Preparing
c0fb81dae3c6: Preparing
2e4eabdbeed3: Preparing
b5d474284f52: Preparing
c307273999be: Preparing
d73750730c30: Preparing
63bbaf04cf0b: Preparing
badb9b2d625b: Preparing
40c928fd4dcc: Preparing
dfcf8dbe47e1: Preparing
6d820e13990c: Preparing
2e4eabdbeed3: Waiting
b5d474284f52: Waiting
c307273999be: Waiting
d73750730c30: Waiting
63bbaf04cf0b: Waiting
badb9b2d625b: Waiting
40c928fd4dcc: Waiting
dfcf8dbe47e1: Waiting
6d820e13990c: Waiting
denied: Unable to create the repository, please check that you have access to do so.
The push refers to a repository [us.gcr.io/whereismy230/appengine/default.20170120t131841]
...
ERROR: (gcloud.app.deploy) Error Response: [2] Build failed; check build logs for details
Using the IAM service, I made sure my account was the owner of the project, and even checked all permissions.
Since the flexible environment relies on the Compute Engines (VMs), I tried to check from the web page and it's telling me that I need to enable billing to be able to use this functionality.
Am I doing something wrong ?
Thanks !
From App Engine Pricing:
Instances within the standard environment have access to a daily
limit of resource usage that is provided at no charge defined by a set
of quotas. Beyond that level, applications will incur charges as
outlined below. To control your application costs, you can set a
spending limit. To estimate costs for the standard environment,
use the pricing calculator.
Go to the pricing calculator
For instances within the flexible environment, services and APIs are
priced as described below.
And from Flexible environment instances:
Applications running in the App Engine flexible environment are
deployed to virtual machine types that you specify. This table
summarizes the hourly billing rates of the various computing
resources:
US
Resource Unit Unit cost
vCPU per core hour $0.0526
Memory per GB hour $0.0071
Persistent disk per GB per month $0.0400
Unlike the standard env, the flex env has no free quota. Which is inline with your observation that the developer console requires billing to be enabled to run GAE flex instances.
Without billing enabled you might be able to deploy your app (but without actually launching a GAE instance for it, so unsure of its usefulness, since you want to try it) by using the --no-promote option:
--promote
Promote the deployed version to receive all traffic.
True by default. To change the default behavior for your current
environment, run:
$ gcloud config set app/promote_by_default false
Overrides the default promote_by_default property value for this
command invocation. Use --no-promote to disable.
Side note: when you encounter problems you may also want to use --verbosity=debug to potentially get more relevant info about the failures.

Resources