FastAPI on Google App Engine: Why am I getting duplicate cloud tasks?

FastAPI on Google App Engine: Why am I getting duplicate cloud tasks? - google-app-engine

I have deployed a FastAPI ML service to Google App Engine but it's exhibiting some odd behavior. The FastAPI service is intended to receive requests from a main service (via Cloud Tasks) and then send responses back. And that does happen. But it appears the route in the FastAPI service that handles these requests gets called four times instead of just once.
My assumption was that GAE, gunicorn, or FastAPI would ensure that the handler runs once per cloud task. But it appears that multiple workers, or some other issue in my config, is causing the handler to get called four times. Here are a few more details and some specific questions:
The Fast API app is deployed to Google App Engine (flex) via gcloud app deploy app.yaml
The app.yaml file includes GUNICORN_ARGS: "--graceful-timeout 3540 --timeout 3600 -k gevent -c gunicorn.gcloud.conf.py main:app"
The Dockerfile in the FastAPI project root (which is used for the gcloud deploy) also includes the final command gunicorn -c gunicorn.gcloud.conf.py main:app
Here's the gunicorn conf:
bind = ":" + os.environ["PORT"]
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
forwarded_allow_ips = "*"
max_requests = 1000
max_requests_jitter = 100
timeout = 200
graceful_timeout = 6000
So I'm confused:
Does GUNICORN_ARGS in app.yaml or the gunicorn argument in the Dockerfile take precedence?
Should I be using multiple workers or is that precisely what's causing multiple tasks?
Happy to provide any other relevant info.

GAE Flex defines environment variables in the app.yaml file [1].
Looking at Docker Compose "In the case of environment, labels, volumes, and devices, Compose “merges” entries together with locally-defined values taking precedence." [2], depending on if they are using a .env file "Values in the shell take precedence over those specified in the .env file." [3]
[1] https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml#defining_environment_variables
[2] https://docs.docker.com/compose/extends/
[3] https://docs.docker.com/compose/environment-variables/
The issue is unlikely to be a Cloud Task duplication issue "in production, more than 99.999% of tasks are executed only once." [4]. You can investigate the calling source
[4] https://cloud.google.com/tasks/docs/common-pitfalls#duplicate_execution
You can also investigate the log contents to see if there are unique identifiers, or if they are the same logs.
For the second question on uvicorn [0] workers, you can try hard coding the value of “workers” to 1 and verify if there is no repetition.
[0] https://www.uvicorn.org/

Related

gcloud app deploy does not terminate even when service is running

I am deploying a node.js server to Google App Engine from Bitbucket pipeline environment and the last command in the script is: gcloud -q app deploy app.yaml --no-promote --verbosity=debug
The logs show that the service is deployed successfully but the script is not terminating, this is the last part of the log:
> DEBUG: Reading GCS logfile: 206 (read 10 bytes) PUSH DONE DEBUG:
> Operation [...] complete. Result: {...} DEBUG: Reading GCS logfile:
> 416 (no new content; keep polling)
> -------------------------------------------------------------------------------- DEBUG: Converted YAML to JSON: "{...}" DEBUG: Operation [...] not
> complete. Waiting to retry. Updating service [default] (this may take
> several minutes)... .DEBUG: Operation [...] not complete. Waiting to
> retry. ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> ......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
> .......DEBUG: Operation [...] not complete. Waiting to retry.
I tried to add readiness_check and liveness_check to app.yml but it didn't change the behaviour.
readiness_check:
path: "/api/public/logout"
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
liveness_check:
path: "/api/public/logout"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
The main unknown here is what criteria does gcloud app deploy uses to determine termination condition?
Also, is there any bypass to this problem?
Update
The problem happens also when running the gcloud app deploy command from local environment (my laptop).
The problem does NOT happen when removing the --no-promote flag.

The gcloud app deploy command expects a well-formed and valid app.yml file, this is what determines its termination condition.
As you confirmed the deployment worked without the --no-promote flag, it could mean that something in the configuration expects the application to be already deployed and running, thus preventing the script to complete.
Another possible cause would be that the Google Cloud SDK version specified in bitbucket-pipelines.yml is an older one. Make sure you work with the latest. This consideration applies extensively to all dependencies in package.json, which might be conflicting with one another, especially when using older versions of Node.js.
This guide can help at building a sound configuration for Bitbucket-based deployments; although the example given is with Python, it might as well be used as a template for processing a Node.js pipeline.
Nb. in this solution, the Google Cloud SDK version is an older one (127.0.0), which will make this deployment fail, so it should be replaced with the latest (228.0.0 or higher). Also the guide omits another required API activation: Cloud Build API. I've notified the team to amend the solution.
I've tested several scenarios with a simple Node.js server, and could not reproduce the issue. Check my Github repository for the code.
For further help on this topic, please provide more hints, such as the content of the app.yml, bitbucket-pipelines.yml, and package.json files, as well as a description of the state of App Engine (services, versions).
In order to deploy the test repository to App Engine from Bitbucket, make sure the following is done on the project:
Enable API's:
App Engine Admin
Cloud Build
Create a Service Account with following permissions, and generate an API Key:
App Engine: Admin
Cloud Build: Editor
Storage: Object Admin

Consume SQS tasks from App Engine

I'm attempting to integrate with a third party that is posting messages on an Amazon SQS queue. I need my GAE backend to receive these messages.
Essentially, I want the following script to launch and always be running
import boto3
sqs_client = boto3.client('sqs',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET,
region_name=REGION)
while True:
sqs_client.receive_message(QueueUrl=QUEUE_URL, WaitTimeSeconds=60)
for message in msgs_response.get('Messages', []):
deferred.defer(process_and_delete_message, message)
My main appengine web app is on Automatic Scaling (with the 60-second &10-minute task timeouts), but I'm thinking of setting up a micro-service set to either Manual Scaling or Basic Scaling because:
Requests can run indefinitely. A manually-scaled instance can choose to handle /_ah/start and execute a program or script for many hours without returning an HTTP response code. Task queue tasks can run up to 24 hours.
https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine
Apparently both Manual & Basic Scaling also allow "Background Threads", but I am having a hard-time finding documentation for it and I'm thinking this may be a relic from the days before they deprecated Backends in favor of Modules (although I did find this https://cloud.google.com/appengine/docs/standard/python/refdocs/modules/google/appengine/api/background_thread/background_thread#BackgroundThread).
Is Manual or Basic Scaling suited for this? If so, what should I use to listen on sqs_client.receive_message()? One thing I'm concerned about is this task/background thread dieing and not relaunching itself.

This maybe a possible solution:
Try to use a Google Compute Engine micro instance to run that script continuously and send a REST call to your app engine app. Easy Python Example For Compute Engine
OR:
I have used modules that run instance type B2/B1 for long running jobs; and I have never had any trouble; but those jobs do start and stop. I use the basic scaling: with max_instances set to 1. The jobs I have run take around 6 hours to complete.

I ended up creating a manual scaling app engine standard micro-service for this. This micro-service has handeler for /_ah/start never returns and runs indefinitely (many days at a time) and when it does get stopped, then app engine restarts it immediately.
Requests can run indefinitely. A manually-scaled instance can choose
to handle /_ah/start and execute a program or script for many hours
without returning an HTTP response code. Task queue tasks can run up
to 24 hours.
https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine
My /_ah/start handler listens to the SQS queue, and creates Push Queue tasks that my default service is set up to listen for.
I was looking into the Compute Engine route as well as the App Engine Flex route (which is essentially Compute Engine managed by app engine), but there were other complexities like not getting access to ndb and the taskqueue sdk and I didn't have time to dive into that.
Below are all of the files for this micro-service, not included is my lib folder that contains the source code for boto3 & some other libraries I needed.
I hope this helpful for someone.
gaesqs.yaml:
application: my-project-id
module: gaesqs
version: dev
runtime: python27
api_version: 1
threadsafe: true
manual_scaling:
instances: 1
env_variables:
theme: 'default'
GAE_USE_SOCKETS_HTTPLIB : 'true'
builtins:
- appstats: on #/_ah/stats/
- remote_api: on #/_ah/remote_api/
- deferred: on
handlers:
- url: /.*
script: gaesqs_main.app
libraries:
- name: jinja2
version: "2.6"
- name: webapp2
version: "2.5.2"
- name: markupsafe
version: "0.15"
- name: ssl
version: "2.7.11"
- name: pycrypto
version: "2.6"
- name: lxml
version: latest
gaesqs_main.py:
#!/usr/bin/env python
import json
import logging
import appengine_config
try:
# This is needed to make local development work with SSL.
# See http://stackoverflow.com/a/24066819/500584
# and https://code.google.com/p/googleappengine/issues/detail?id=9246 for more information.
from google.appengine.tools.devappserver2.python import sandbox
sandbox._WHITE_LIST_C_MODULES += ['_ssl', '_socket']
import sys
# this is socket.py copied from a standard python install
from lib import stdlib_socket
socket = sys.modules['socket'] = stdlib_socket
except ImportError:
pass
import boto3
import os
import webapp2
from webapp2_extras.routes import RedirectRoute
from google.appengine.api import taskqueue
app = webapp2.WSGIApplication(debug=os.environ['SERVER_SOFTWARE'].startswith('Dev'))#, config=webapp2_config)
KEY = "<MY-KEY>"
SECRET = "<MY-SECRET>"
REGION = "<MY-REGION>"
QUEUE_URL = "<MY-QUEUE_URL>"
def process_message(message_body):
queue = taskqueue.Queue('default')
task = taskqueue.Task(
url='/task/sqs-process/',
countdown=0,
target='default',
params={'message': message_body})
queue.add(task)
class Start(webapp2.RequestHandler):
def get(self):
logging.info("Start")
for loggers_to_suppress in ['boto3', 'botocore', 'nose', 's3transfer']:
logger = logging.getLogger(loggers_to_suppress)
if logger:
logger.setLevel(logging.WARNING)
logging.info("boto3 loggers suppressed")
sqs_client = boto3.client('sqs',
aws_access_key_id=KEY,
aws_secret_access_key=SECRET,
region_name=REGION)
while True:
msgs_response = sqs_client.receive_message(QueueUrl=QUEUE_URL, WaitTimeSeconds=20)
logging.info("msgs_response: %s" % msgs_response)
for message in msgs_response.get('Messages', []):
logging.info("message: %s" % message)
process_message(message['Body'])
sqs_client.delete_message(QueueUrl=QUEUE_URL, ReceiptHandle=message['ReceiptHandle'])
_routes = [
RedirectRoute('/_ah/start', Start, name='start'),
]
for r in _routes:
app.router.add(r)
appengine_config.py:
import os
from google.appengine.ext import vendor
from google.appengine.ext.appstats import recording
appstats_CALC_RPC_COSTS = True
# Add any libraries installed in the "lib" folder.
# Use pip with the -t lib flag to install libraries in this directory:
# $ pip install -t lib gcloud
# https://cloud.google.com/appengine/docs/python/tools/libraries27
try:
vendor.add('lib')
except:
print "Unable to add 'lib'"
def webapp_add_wsgi_middleware(app):
app = recording.appstats_wsgi_middleware(app)
return app
if os.environ.get('SERVER_SOFTWARE', '').startswith('Development'):
print "gaesqs development"
import imp
import os.path
import inspect
from google.appengine.tools.devappserver2.python import sandbox
sandbox._WHITE_LIST_C_MODULES += ['_ssl', '_socket']
# Use the system socket.
real_os_src_path = os.path.realpath(inspect.getsourcefile(os))
psocket = os.path.join(os.path.dirname(real_os_src_path), 'socket.py')
imp.load_source('socket', psocket)
os.environ['HTTP_HOST'] = "my-project-id.appspot.com"
else:
print "gaesqs prod"
# Doing this on dev_appserver/localhost seems to cause outbound https requests to fail
from lib import requests
from lib.requests_toolbelt.adapters import appengine as requests_toolbelt_appengine
# Use the App Engine Requests adapter. This makes sure that Requests uses
# URLFetch.
requests_toolbelt_appengine.monkeypatch()

Elasticsearch deployment on google app engine flex

Is it possible to deploy Elasticsearch on App engine flex environment using a docker image.
I have tried the following
My files on the local machine
Folder : elasticsearch
app.yaml
Dockerfile
docker-entrypoint.sh
config folder(containing elasticsearch.yml)file
Contents of app.yaml
runtime: custom
env: flex
Dockerfile and docker-entrypoint.sh copied from https://github.com/GoogleCloudPlatform/elasticsearch-docker/tree/master/5/5.2.0
Modifications to the Dockerfile
replaced EXPOSE 9200 9300 to EXPOSE 8080
Modification to the elasticsearch.yml
cluster.name: "beaconinside-docker-cluster"
path.data: /usr/share/elasticsearch/data
http.host: 0.0.0.0
http.port: 8080
discovery.zen.minimum_master_nodes: 1
I build a container using the docker file on my local machine
docker build -t elasticdemo .
Then, I run the container
docker run -p 8080:8080 elasticdemo
I am able to access elasticsearch on 0.0.0.0:8080
Problem:
I am trying to deploy elasticsearch as an app to Google app engine flex environment
gcloud app deploy app.yaml --version elasticdocker --project myproject
The deployment fails with the following error
Updating service [default]...failed.
ERROR: (gcloud.app.deploy) Error Response: [9]
I was expected elasticsearch to deploy as an app and be available on the deployed url.
Could you please provide pointers/help/suggestions with this approach?

While you can deploy ES to App Engine Flexible environment it's not particularly useful. The VMs hosting GAE Flexible containers are restarted regularly as part of maintenance and whatever data is stored on the local disk will be lost on restart. If you want to use local disk for long term storage, I'd suggest to deploy the GCE VM's (or alternatively use a solution from the GCP Marketplace) or deploy to GKE which supports persistent disks
As for the actual question: you probably don't have a health check handler and therefore App Engine Flexible environment doesn't consider your app healthy after deploying it. The error message is useless, I agree.
From the GAE Flexible docs for building custom images:
"A health check is an HTTP request to the URL /_ah/health. A healthy application should respond with status code 200."
Alternatively you can turn off health checks by adding into app.yaml
enable_health_check: False

How to deploy a GAE project in flexible environment without billing?

I've been developing some REST service using Flask and other third party libraries and I want to deploy it to GAE in the flexible environment. I usually deploy to the GAE standard environment but I wanted to try the new flexible environment. At the moment I wish to deploy to flexible environment without enabling billing, and the Google support assured me that it was possible to deploy over GAE flexible environment without enabling billing.
Running my code locally works fine, and have the following yaml file:
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT whereismybus230.starter:app
runtime_config:
python_version: 3
So I created a new project on through the Google cloud console web page (as usual), and created a new gcloud profile on my local machine so I deploy it to this new project.
Then I run:
gcloud app deploy --verbosity=info
I get that a docker image is being build and at some point it will be pushed to a Compute Engine but it fails after a few minutes here:
Successfully built sophiabus230 aniso8601 future docopt itsdangerous MarkupSafe
Installing collected packages: Werkzeug, click, MarkupSafe, Jinja2, itsdangerous, Flask, jsonschema, pytz, six, python-dateutil, aniso8601, flask-restplus, beautifulsoup4, future, sophiabus230, coverage, requests, docopt, coveralls
Successfully installed Flask-0.12 Jinja2-2.9.4 MarkupSafe-0.23 Werkzeug-0.11.15 aniso8601-1.2.0 beautifulsoup4-4.5.3 click-6.7 coverage-4.3.4 coveralls-1.1 docopt-0.6.2 flask-restplus-0.9.2 future-0.16.0 itsdangerous-0.24 jsonschema-2.5.1 python-dateutil-2.6.0 pytz-2016.10 requests-2.12.5 six-1.10.0 sophiabus230-0.4
---> 3e3438680079
Removing intermediate container bd9f8ccb6f4a
Step 8 : ADD . /app/
---> bde0915f6720
Removing intermediate container e3193eb4ef70
Step 9 : CMD gunicorn -b :$PORT whereismybus230.starter:app
---> Running in 022d38d769f8
---> 36893d0a549a
Removing intermediate container 022d38d769f8
Successfully built 36893d0a549a
PUSH
The push refers to a repository [us.gcr.io/whereismy230/appengine/default.20170120t131841]
e5f488ee94c5: Preparing
8d27ce27f03c: Preparing
3d5800d45c36: Preparing
06ba8a2a8ec3: Preparing
c0fb81dae3c6: Preparing
2e4eabdbeed3: Preparing
b5d474284f52: Preparing
c307273999be: Preparing
d73750730c30: Preparing
63bbaf04cf0b: Preparing
badb9b2d625b: Preparing
40c928fd4dcc: Preparing
dfcf8dbe47e1: Preparing
6d820e13990c: Preparing
2e4eabdbeed3: Waiting
b5d474284f52: Waiting
c307273999be: Waiting
d73750730c30: Waiting
63bbaf04cf0b: Waiting
badb9b2d625b: Waiting
40c928fd4dcc: Waiting
dfcf8dbe47e1: Waiting
6d820e13990c: Waiting
denied: Unable to create the repository, please check that you have access to do so.
The push refers to a repository [us.gcr.io/whereismy230/appengine/default.20170120t131841]
...
ERROR: (gcloud.app.deploy) Error Response: [2] Build failed; check build logs for details
Using the IAM service, I made sure my account was the owner of the project, and even checked all permissions.
Since the flexible environment relies on the Compute Engines (VMs), I tried to check from the web page and it's telling me that I need to enable billing to be able to use this functionality.
Am I doing something wrong ?
Thanks !

From App Engine Pricing:
Instances within the standard environment have access to a daily
limit of resource usage that is provided at no charge defined by a set
of quotas. Beyond that level, applications will incur charges as
outlined below. To control your application costs, you can set a
spending limit. To estimate costs for the standard environment,
use the pricing calculator.
Go to the pricing calculator
For instances within the flexible environment, services and APIs are
priced as described below.
And from Flexible environment instances:
Applications running in the App Engine flexible environment are
deployed to virtual machine types that you specify. This table
summarizes the hourly billing rates of the various computing
resources:
US
Resource Unit Unit cost
vCPU per core hour $0.0526
Memory per GB hour $0.0071
Persistent disk per GB per month $0.0400
Unlike the standard env, the flex env has no free quota. Which is inline with your observation that the developer console requires billing to be enabled to run GAE flex instances.
Without billing enabled you might be able to deploy your app (but without actually launching a GAE instance for it, so unsure of its usefulness, since you want to try it) by using the --no-promote option:
--promote
Promote the deployed version to receive all traffic.
True by default. To change the default behavior for your current
environment, run:
$ gcloud config set app/promote_by_default false
Overrides the default promote_by_default property value for this
command invocation. Use --no-promote to disable.
Side note: when you encounter problems you may also want to use --verbosity=debug to potentially get more relevant info about the failures.

Time out error when trying to create Google managed vm

I'm trying to create a managed vm for my node 4 application using google custom runtime.
I created the following Dockerfile:
FROM node:4.2.1
ENV PORT 8080
ADD package.json package.json
RUN npm install
ADD . .
CMD [ "npm", "start" ]
Along with this app.yaml:
# [START runtime]
runtime: custom
vm: true
api_version: 1
# [END runtime]
health_check:
enable_health_check: false
skip_files:
- ^(.*/)?#.*#$
- ^(.*/)?.*~$
- ^(.*/)?.*\.py[co]$
- ^(.*/)?.*/RCS/.*$
- ^(.*/)?\..*$
- ^(.*/)?.*/node_modules/.*$
- ^(.*/)?.*\.log$
I deploy the app using gcloud preview command:
gcloud preview app deploy app.yaml --promote
It seems like the docker is being built correctly but the at the end of the process I get this message:
Copying files to Google Cloud Storage...
Synchronizing files to [gs://staging.my-project-id.appspot.com/].
Updating module [default]...\Deleted [https://www.googleapis.com/compute/v1/projects/my-project-id/zones/us-central1-f/instances/gae-builder-vm-20151030t142257].
Updating module [default]...failed.
ERROR: (gcloud.preview.app.deploy) Error Response: [4] Timed out creating VMs.

I have my deployment working now. I have had to troubleshoot the same problem before, for another project, but I didn't have the code on hand, so I had to work through the problems again.
The deployment ran smoothly up until the last steps, where updating the module would timeout. This made me think it was something to do with the application starting up on VM and not responding appropriately, so the final hook would time out.
You'll find a lot of information here - https://cloud.google.com/appengine/docs/managed-vms/config . I checked the following things:
logging - ensure that you are writing to the correct log file. See https://cloud.google.com/appengine/docs/managed-vms/custom-runtimes#logging
ensure you have a .dockerignore file and are skipping files in app.yaml so you are not asking the process to copy across unneeded node_modules or log files
turn off health checking if you are not using it, or ensure you have the correct express.js routes configured for it
check that your environment variables are set and match what GAE can use. This was my final step - GAE will let you bind to a VM port on 8080. I had to pass through a NODE_ENV flag in my app.yaml which told the app to use 8080 and not 3000.
Lift the resources of the GAE instance in app.yaml. I specified two logical CPUs and made the ram 2 gig.
Good luck.