Is there a way to disable health checking when running the development server for manages VMs locally (gcloud preview app run app.yaml)?
This health checking causes me headaches during debugging.
I tried to add health_check settings to app.yaml like shown in https://cloud.google.com/appengine/docs/go/managed-vms/ :
health_check:
enable_health_check: False
and tried different values for
check_interval: 5
timeout: 4
unhealthy_threshold: 2
healthy_threshold: 2
restart_threshold: 60
but none of these changes did work.
enable_health_check: False seems to be ignored and so are most of the other settings (some cause an error) see https://code.google.com/p/googleappengine/issues/detail?id=11491
From a comment from the issue you provided:
There's also an existing bug about the dev server (gcloud preview app
run) not respecting the health_check setting. It's still using the old
and deprecated 'vm_health_check'. To get your settings to take effect
in the dev server you'll need to use vm_health_check for now.
So just use for now:
# health_check: # not yet supported, use instead
vm_health_check:
enable_health_check: False
or change one of the following settings
# check_interval: # this is an error in the documentation, use instead
check_interval_sec: 5
# timeout: 4 # didn't work with vm_health_check
unhealthy_threshold: 2
healthy_threshold: 2
restart_threshold: 60
Related
We're working on migrating our backend from appengine Standard (and the "webapp2" framework) to Flexible using pyramid. We have a proof of concept of sorts running seemingly without many issues. All it does in this early phase is take requests from a third party ("pings") and then go kick off a task to another internal service to go fetch some data. It connects with googles MemoryStore to cache a user-id to indicate that we've already fetched that users data (or attempted to) within the last 6 hours.
Speaking of 6 hours, it seems that every 6 hours or so, memory usage on the Flexible instance seems to reach a tipping point, and then probably flushes, and all is fine again. This instance is set to have 512MB of memory, yet like clockwork it does this at around 800MB (some kind of grace-usage? or maybe these can't be set to under 1GB)
It's clear by how gradual it's moving that memory isn't being cleared as often as maybe it should be.
When this happens, latency on the instance also spikes.
I'm not sure what's useful in debugging something like this so I'll try to show what I can.
Appengine YAML file:
runtime: custom
env: flex
service: ping
runtime_config:
python_version: 3.7
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
Dockerfile (as custom-runtime Flexible needs it)
FROM gcr.io/google-appengine/python
RUN virtualenv /env
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt
ADD . /app
RUN pip install -e .
CMD gunicorn -b :$PORT main:app
Why custom? I couldn't get this working under the default Python runtime. The pip install -e . was what appeared to be needed.
Then, in root __init__ I have:
from pyramid.config import Configurator
from externalping.memcache import CacheStore
cachestore = CacheStore()
def main(global_config, **settings):
""" This function returns a Pyramid WSGI application.
"""
with Configurator(settings=settings) as config:
config.include('.routes')
config.scan()
return config.make_wsgi_app()
Maybe having the connection to MemoryStore defined so early is the issue? Cachestore:
class CacheStore(object):
redis_host = os.environ.get('REDISHOST', 'localhost')
redis_port = int(os.environ.get('REDISPORT', 6379))
client = None
def __init__(self):
self.client = redis.StrictRedis(host=self.redis_host, port=self.redis_port)
def set_json(self, key, value):
self.client.set(key, json.dumps(value))
return True
def get_json(self, key):
return json.loads(self.client.get(key))
On the actual request itself, after importing from externalping import cachestore, I'm simply calling those methods shown above: cachestore.client.get(user['ownerId'])
This does appear to be how google's documentation says to implement this, as best I can tell. Only difference really is I put a wrapper around it.
I am deploying a node.js server to Google App Engine from Bitbucket pipeline environment and the last command in the script is: gcloud -q app deploy app.yaml --no-promote --verbosity=debug
The logs show that the service is deployed successfully but the script is not terminating, this is the last part of the log:
> DEBUG: Reading GCS logfile: 206 (read 10 bytes) PUSH DONE DEBUG:
> Operation [...] complete. Result: {...} DEBUG: Reading GCS logfile:
> 416 (no new content; keep polling)
> -------------------------------------------------------------------------------- DEBUG: Converted YAML to JSON: "{...}" DEBUG: Operation [...] not
> complete. Waiting to retry. Updating service [default] (this may take
> several minutes)... .DEBUG: Operation [...] not complete. Waiting to
> retry. ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
I tried to add readiness_check and liveness_check to app.yml but it didn't change the behaviour.
readiness_check:
path: "/api/public/logout"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
liveness_check:
path: "/api/public/logout"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
The main unknown here is what criteria does gcloud app deploy uses to determine termination condition?
Also, is there any bypass to this problem?
Update
The problem happens also when running the gcloud app deploy command from local environment (my laptop).
The problem does NOT happen when removing the --no-promote flag.
The gcloud app deploy command expects a well-formed and valid app.yml file, this is what determines its termination condition.
As you confirmed the deployment worked without the --no-promote flag, it could mean that something in the configuration expects the application to be already deployed and running, thus preventing the script to complete.
Another possible cause would be that the Google Cloud SDK version specified in bitbucket-pipelines.yml is an older one. Make sure you work with the latest. This consideration applies extensively to all dependencies in package.json, which might be conflicting with one another, especially when using older versions of Node.js.
This guide can help at building a sound configuration for Bitbucket-based deployments; although the example given is with Python, it might as well be used as a template for processing a Node.js pipeline.
Nb. in this solution, the Google Cloud SDK version is an older one (127.0.0), which will make this deployment fail, so it should be replaced with the latest (228.0.0 or higher). Also the guide omits another required API activation: Cloud Build API. I've notified the team to amend the solution.
I've tested several scenarios with a simple Node.js server, and could not reproduce the issue. Check my Github repository for the code.
For further help on this topic, please provide more hints, such as the content of the app.yml, bitbucket-pipelines.yml, and package.json files, as well as a description of the state of App Engine (services, versions).
In order to deploy the test repository to App Engine from Bitbucket, make sure the following is done on the project:
Enable API's:
App Engine Admin
Cloud Build
Create a Service Account with following permissions, and generate an API Key:
App Engine: Admin
Cloud Build: Editor
Storage: Object Admin
we updated our google app engine health checks from the legacy version to the new version using and now our deployments are failing. Nothing else on the project has changed. We tested the default settings and then extended checks just in case.
This is the error:
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
This is our app.yaml:
liveness_check:
check_interval_sec: 120
timeout_sec: 40
failure_threshold: 5
success_threshold: 5
initial_delay_sec: 500
readiness_check:
check_interval_sec: 120
timeout_sec: 40
failure_threshold: 5
success_threshold: 5
app_start_timeout_sec: 1500
Unfortunately, no matter the configuration, both the readiness and liveness checks are throwing 404s.
What could be causing the problem? and how can we debug this?
Is it possible to rollback to the legacy health checks?
This is usually caused when the application is still reading from the legacy health check flags and/or deploying the app using gcloud app deploy without enabling the updated health checks first. You can check this by:
1- Making sure the legacy health_check flag does not exist on your app.yaml.
2- Run gcloud beta app describe to see whether splitHealthChecks flag is set to true under featureSettings.
By default, HTTP requests from updated health checks are not forwarded to your application container. If you want to extend health checks to your application, then specify a path for liveness checks or readiness checks.
You can then enable updated health checks by using gcloud beta app update --split-health-checks --project [your-project-id]. See this public issue tracker or this article about Updated Health Checks about for more details.
In my case, I solved this issue by manually increasing memory allocation?
resources:
cpu: 1
memory_gb: 2
disk_size_gb: 10
Found this solution in a google forum:
https://groups.google.com/forum/#!topic/google-appengine/Po_-SkC5DOE
For those of you who want to migrate to the default settings for splitted health checks, follow these steps:
1) Remove health_check, liveness_check and readiness_check sections from your app.yaml file
2) Deploy to a newer version, This is important. So, for example, if your current version is production, change it to something else like prod in the command gcloud app deploy --version [new-version-name]
I'm trying to create a managed vm for my node 4 application using google custom runtime.
I created the following Dockerfile:
FROM node:4.2.1
ENV PORT 8080
ADD package.json package.json
RUN npm install
ADD . .
CMD [ "npm", "start" ]
Along with this app.yaml:
# [START runtime]
runtime: custom
vm: true
api_version: 1
# [END runtime]
health_check:
enable_health_check: false
skip_files:
- ^(.*/)?#.*#$
- ^(.*/)?.*~$
- ^(.*/)?.*\.py[co]$
- ^(.*/)?.*/RCS/.*$
- ^(.*/)?\..*$
- ^(.*/)?.*/node_modules/.*$
- ^(.*/)?.*\.log$
I deploy the app using gcloud preview command:
gcloud preview app deploy app.yaml --promote
It seems like the docker is being built correctly but the at the end of the process I get this message:
Copying files to Google Cloud Storage...
Synchronizing files to [gs://staging.my-project-id.appspot.com/].
Updating module [default]...\Deleted [https://www.googleapis.com/compute/v1/projects/my-project-id/zones/us-central1-f/instances/gae-builder-vm-20151030t142257].
Updating module [default]...failed.
ERROR: (gcloud.preview.app.deploy) Error Response: [4] Timed out creating VMs.
I have my deployment working now. I have had to troubleshoot the same problem before, for another project, but I didn't have the code on hand, so I had to work through the problems again.
The deployment ran smoothly up until the last steps, where updating the module would timeout. This made me think it was something to do with the application starting up on VM and not responding appropriately, so the final hook would time out.
You'll find a lot of information here - https://cloud.google.com/appengine/docs/managed-vms/config . I checked the following things:
logging - ensure that you are writing to the correct log file. See https://cloud.google.com/appengine/docs/managed-vms/custom-runtimes#logging
ensure you have a .dockerignore file and are skipping files in app.yaml so you are not asking the process to copy across unneeded node_modules or log files
turn off health checking if you are not using it, or ensure you have the correct express.js routes configured for it
check that your environment variables are set and match what GAE can use. This was my final step - GAE will let you bind to a VM port on 8080. I had to pass through a NODE_ENV flag in my app.yaml which told the app to use 8080 and not 3000.
Lift the resources of the GAE instance in app.yaml. I specified two logical CPUs and made the ram 2 gig.
Good luck.
I get this error on appengine when I run gcloud preview app run app.yaml:
The --custom_entrypoint flag must be set for custom runtimes
My app.yamllooks like:
version: 0-1-1
runtime: custom
vm: true
api_version: 1
manual_scaling:
instances: 1
handlers:
- url: .*
script: dynamic
My dockerfile is just:
FROM google/nodejs-runtime
I reinstalled gcloud to get the latest version, did something change in the yaml config for managed VMs? This makes it impossible for me to test my app.
There appears to be a bug or setup issue with Google Cloud SDK version 0.9.67 causing this error. As a temporary workaround, you can revert to previous SDK version, which is working, with the following commands:
gcloud config set component_manager/fixed_sdk_version 0.9.66
gcloud components update
To return to the current version of the SDK, run:
gcloud config unset component_manager/fixed_sdk_version
gcloud components update
This issue appeared a few versions ago and was addressed here:
Running node.js on google cloud, but error running with docker
You can run gcloud help preview app run to show a man page describing the run command and its parameters. --custom-entrypoint is described as:
--custom-entrypoint CUSTOM_ENTRYPOINT
Specify an entrypoint for custom runtime modules. This is required when
such modules are present. Include "{port}" in the string (without
quotes) to pass the port number in as an argument. For instance:
--custom_entrypoint="gunicorn -b localhost:{port} mymodule:application"
Note that the error message says --custom_entrypoint, with an underscore, but the parameter is --customer_entrypoint, with a dash. The correct name is --custom-entrypoint see: https://code.google.com/p/google-cloud-sdk/issues/detail?id=191
For a nodejs you should be able to use something like:
gcloud preview app run app.yaml --project=your-project-id --custom-entrypoint "node index.js {port}"
Depending on how you start your application. The port seems to also be available as the environment variable PORT so you don't need to use {port} if your app does not handle command line arguments.
I haven't been able to use npm start or other npm run <script> from the --custom-entrypoint however.
Comment lines 391 to 397 in
google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/module.py
# if (self._module_configuration.effective_runtime == 'custom' and
# os.environ.get('GAE_LOCAL_VM_RUNTIME') != '0'):
# if not self._custom_config.custom_entrypoint:
# raise ValueError('The --custom_entrypoint flag must be set for '
# 'custom runtimes')
# else:
# runtime_config.custom_config.CopyFrom(self._custom_config)