How to configure entrypoint in Python 3 to match Python 2 behavior? - google-app-engine

I'm currently working on porting my app from GAE Python 2 to Python 3. I'd like the process/threading and scaling characteristics in Python 3 to match the Python 2 behavior. Specifically, I want the number of processes, threads, and 60 second timeout to match.
I set in app.yaml:
entrypoint: gunicorn -b :$PORT main:app -t 60 -w 1 --threads 8
As shown, the timeout is 60 seconds.
Also set is 1 worker and many threads, because multiple workers causes out-of-memory errors on requests, which did not occur in Python 2 runtime. Furthermore, from the Python 2 docs, it seems that they might have just used 1 worker and multiple threads:
https://cloud.google.com/appengine/docs/standard/python/config/appref
Am I on track here? Did GAE Python 2 in fact use 1 process and many threads in threadsafe mode?

I 'believe' Python2.7 supported concurrent requests based on this documentation
Regarding your comment - multiple workers causes out-of-memory errors - if you increase number of workers, then you probably have to use a higher class (see documentation)

Related

FastAPI on Google App Engine: Why am I getting duplicate cloud tasks?

I have deployed a FastAPI ML service to Google App Engine but it's exhibiting some odd behavior. The FastAPI service is intended to receive requests from a main service (via Cloud Tasks) and then send responses back. And that does happen. But it appears the route in the FastAPI service that handles these requests gets called four times instead of just once.
My assumption was that GAE, gunicorn, or FastAPI would ensure that the handler runs once per cloud task. But it appears that multiple workers, or some other issue in my config, is causing the handler to get called four times. Here are a few more details and some specific questions:
The Fast API app is deployed to Google App Engine (flex) via gcloud app deploy app.yaml
The app.yaml file includes GUNICORN_ARGS: "--graceful-timeout 3540 --timeout 3600 -k gevent -c gunicorn.gcloud.conf.py main:app"
The Dockerfile in the FastAPI project root (which is used for the gcloud deploy) also includes the final command gunicorn -c gunicorn.gcloud.conf.py main:app
Here's the gunicorn conf:
bind = ":" + os.environ["PORT"]
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
forwarded_allow_ips = "*"
max_requests = 1000
max_requests_jitter = 100
timeout = 200
graceful_timeout = 6000
So I'm confused:
Does GUNICORN_ARGS in app.yaml or the gunicorn argument in the Dockerfile take precedence?
Should I be using multiple workers or is that precisely what's causing multiple tasks?
Happy to provide any other relevant info.
GAE Flex defines environment variables in the app.yaml file [1].
Looking at Docker Compose "In the case of environment, labels, volumes, and devices, Compose “merges” entries together with locally-defined values taking precedence." [2], depending on if they are using a .env file "Values in the shell take precedence over those specified in the .env file." [3]
[1] https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml#defining_environment_variables
[2] https://docs.docker.com/compose/extends/
[3] https://docs.docker.com/compose/environment-variables/
The issue is unlikely to be a Cloud Task duplication issue "in production, more than 99.999% of tasks are executed only once." [4]. You can investigate the calling source
[4] https://cloud.google.com/tasks/docs/common-pitfalls#duplicate_execution
You can also investigate the log contents to see if there are unique identifiers, or if they are the same logs.
For the second question on uvicorn [0] workers, you can try hard coding the value of “workers” to 1 and verify if there is no repetition.
[0] https://www.uvicorn.org/

Possibly memory leak in Pyramid on Appengine Flexible with MemoryStore

We're working on migrating our backend from appengine Standard (and the "webapp2" framework) to Flexible using pyramid. We have a proof of concept of sorts running seemingly without many issues. All it does in this early phase is take requests from a third party ("pings") and then go kick off a task to another internal service to go fetch some data. It connects with googles MemoryStore to cache a user-id to indicate that we've already fetched that users data (or attempted to) within the last 6 hours.
Speaking of 6 hours, it seems that every 6 hours or so, memory usage on the Flexible instance seems to reach a tipping point, and then probably flushes, and all is fine again. This instance is set to have 512MB of memory, yet like clockwork it does this at around 800MB (some kind of grace-usage? or maybe these can't be set to under 1GB)
It's clear by how gradual it's moving that memory isn't being cleared as often as maybe it should be.
When this happens, latency on the instance also spikes.
I'm not sure what's useful in debugging something like this so I'll try to show what I can.
Appengine YAML file:
runtime: custom
env: flex
service: ping
runtime_config:
python_version: 3.7
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
Dockerfile (as custom-runtime Flexible needs it)
FROM gcr.io/google-appengine/python
RUN virtualenv /env
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt
ADD . /app
RUN pip install -e .
CMD gunicorn -b :$PORT main:app
Why custom? I couldn't get this working under the default Python runtime. The pip install -e . was what appeared to be needed.
Then, in root __init__ I have:
from pyramid.config import Configurator
from externalping.memcache import CacheStore
cachestore = CacheStore()
def main(global_config, **settings):
""" This function returns a Pyramid WSGI application.
"""
with Configurator(settings=settings) as config:
config.include('.routes')
config.scan()
return config.make_wsgi_app()
Maybe having the connection to MemoryStore defined so early is the issue? Cachestore:
class CacheStore(object):
redis_host = os.environ.get('REDISHOST', 'localhost')
redis_port = int(os.environ.get('REDISPORT', 6379))
client = None
def __init__(self):
self.client = redis.StrictRedis(host=self.redis_host, port=self.redis_port)
def set_json(self, key, value):
self.client.set(key, json.dumps(value))
return True
def get_json(self, key):
return json.loads(self.client.get(key))
On the actual request itself, after importing from externalping import cachestore, I'm simply calling those methods shown above: cachestore.client.get(user['ownerId'])
This does appear to be how google's documentation says to implement this, as best I can tell. Only difference really is I put a wrapper around it.

Flink 1.3 running a single job on YARN how to set the number of Task Slots per TaskManager

I am running a single flink job on Yarn as descriped here.
flink run -m yarn-cluster -yn 3 -ytm 12000
I can set the number of yarn nodes / task managers with the above parameter -yn. However I want to know whether it is possible to set the number of task slots per task manager. When I use the parallelsim (-p) parameter it only sets the overall parallelism. And the number of task slots is computed by dividing this value by the number of provided task managers. I tried using the dynamic properties (-yD) parameter which is supposed to "allow the user to specify additional configuration values" like this:
-yD -Dtaskmanager.numberOfTaskSlots=8
But this does not overwrite the value given in the flink-conf.yaml.
Is there any way to specify the number of task slots per TaskManager when running a single on flink (other than changing the config file)?
Also is there a documentation which dynamic properties are valid using the -yD parameter?
You can use the settings of yarn-session, here, prefixed by y to submit Flink job on YARN cluster. For example the command,
flink run -m yarn-cluster -yn 5 -yjm 768 -ytm 1400 -ys 2 -yqu streamQ my_program.jar
will submit my_program.jar Flink application with 5 containers, 768m memory for the jobmanager, 1400m memory and 2 cpu core for taskmanagers, each and will use the resources of nodemanagers on predefined YARN queue streamQ. See my answer to this post for other important information.

Share storage bucket between apps

I have an internal tool that lets me edit configuration files and then the config files gets synced to Google Storage (* * * * * gsutil -m rsync -d /data/www/config_files/ gs://my-site.appspot.com/configs/).
How can I use these config files across multiple instances in Google App Engine? (I don't want to use the Google PHP SDK to read / write to the config files in the bucket).
Only thing I can come up with is making a cron.yaml file that downloads the configs from the bucket to /app/configs/ every minute, but then I'd have to reload php-fpm every minute as well.
app.yaml:
runtime: custom
env: flex
service: my-site
env_variables:
CONFIG_DIR: /app/configs
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
automatic_scaling:
min_num_instances: 2
max_num_instances: 20
cpu_utilization:
target_utilization: 0.5
Dockerfile:
FROM eu.gcr.io/google-appengine/php71
RUN mkdir -p /app;
ADD . /app
RUN chmod -R a+r /app
I am assuming you are designing a solution where you can use pull requests on the GCS bucket to get configuration and update your apps en mass quickly.
There are many points in the process, depending on your exact flow, where you can insert a "please update now" command. For example, why can't you simply queue a task as you update the configuration in your GCS bucket? That task will basically down the configuration and redeloy your application.
Unless you are thinking about using multiple applications that have access to that bucket, and you want to be able to update them at the same time centrally. In that case, your cron job solution makes sense. Dan's suggestion definitely works, but I think you can make it easier by using version numbers. Simply have another file with a version number in it, the cron job pulls that file, compares it and then performs an update if the version is newer. It's very similar to Dan's solution except you don't really need to hash anything. If you are updating GCS with your configurations, might as well tag on another file with the version information.
Another solution is to expose a handler in all those applications, for example an "/update" handler. Whenever it's hit, the application performs the update. You can hit that handler whenever you actually update the configuration in your GCS. This is more of a push solution. The advantage is that you have more control over which applications gets the updates, this might be useful if you aren't sure about a certain configuration yet so you don't want to update everything at once.
We did not want to add a handler in our application for this. We thought it was best to use supervisord.
additional-supervisord.conf:
[program:sync-configs]
command = /app/scripts/sync_configs.sh
startsecs = 0
autorestart = false
startretries = 1
sync_configs.sh:
#!/usr/bin/env bash
while true; do
# Sync configs from Google Storage.
gsutil -m rsync -c -r ${CONFIG_BUCKET} /app/config
# Reload PHP-FPM
ps ax | grep php-fpm | cut -f2 -d" " - | xargs kill -s USR2
# Wait 60 seconds.
sleep 60
done
Dockerfile:
COPY additional-supervisord.conf /etc/supervisor/conf.d/

How to load test Server Sent Events?

I have a small app that sends Server Sent Events. I would like to load test my app so I can benchmark the latency from the time a message is pushed to the time the message is received so I can know when/where the performance breaks down. What tools are available to be able to do this?
Since Server-Sent Events it is just HTTP you can use siege utility. Here is the example:
siege -b -t 1m -c45 http://127.0.0.1:9292/streaming
Where:
-b benchmark mode i.e. don't wait between connections
-t 1m benchmark for 1 minute
-c45 number of concurrent connections
http://127.0.0.1:9292 my dev server host and custom port
/streaming HTTP-endpoint which respond with Content-Type: text/event-stream
Output:
Lifting the server siege... done.
Transactions: 79 hits
Availability: 100.00 %
Elapsed time: 59.87 secs
Data transferred: 0.01 MB
Response time: 23.43 secs
Transaction rate: 1.32 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 30.91
Successful transactions: 79
Failed transactions: 0
Longest transaction: 30.12
Shortest transaction: 10.04
I took a simple path of creating a shell script that initiates N background jobs of cURL which connected to the SSE endpoint of my service.
To get the exact cURL syntax, open your Chrome web dev tools -> Network tab -> right click on the entry of the request to the SSE endpoint and choose from the context menu "Copy as cURL"
Then you paste that command in a shell script that roughly looks like:
#!/bin/bash
i=0;
while [ $i -lt 50 ] ;do
[PASTE YOUR cURL COMMAND HERE] -s -o /dev/null &
i=`expr $i + 1`;
done
This will add 50 background cURL jobs each time it's run. Notice that I added to Chrome's cURL command the params -s -o /dev/null. This is to run cURL in silent mode and to suppress any output.
In my case the service was implemented in NodeJs, so I used process.hrtime() for high precision timing to measure the delay of looping through the N connected clients to broadcast the data.
The results were ok: it served 1000+ active connections in ~0.02sec
Keep in mind that if you run server + cURL clients from the same machine, you'll probably hit OS limits of open files. To see open file limits on your linux box (common case is 1024) run:
$ ulimit -n
To avoid reaching the 1000+ active cURLs I got, you can:
start them from multiple machines
or increase this limit (see sysctl)
The problem I faced was that eventually node crushed with an ELIFECYCLE error and the log was not very helpful in diagnosing the problem. Any suggestions are welcome.

Resources