I have an internal tool that lets me edit configuration files and then the config files gets synced to Google Storage (* * * * * gsutil -m rsync -d /data/www/config_files/ gs://my-site.appspot.com/configs/).
How can I use these config files across multiple instances in Google App Engine? (I don't want to use the Google PHP SDK to read / write to the config files in the bucket).
Only thing I can come up with is making a cron.yaml file that downloads the configs from the bucket to /app/configs/ every minute, but then I'd have to reload php-fpm every minute as well.
app.yaml:
runtime: custom
env: flex
service: my-site
env_variables:
CONFIG_DIR: /app/configs
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
automatic_scaling:
min_num_instances: 2
max_num_instances: 20
cpu_utilization:
target_utilization: 0.5
Dockerfile:
FROM eu.gcr.io/google-appengine/php71
RUN mkdir -p /app;
ADD . /app
RUN chmod -R a+r /app
I am assuming you are designing a solution where you can use pull requests on the GCS bucket to get configuration and update your apps en mass quickly.
There are many points in the process, depending on your exact flow, where you can insert a "please update now" command. For example, why can't you simply queue a task as you update the configuration in your GCS bucket? That task will basically down the configuration and redeloy your application.
Unless you are thinking about using multiple applications that have access to that bucket, and you want to be able to update them at the same time centrally. In that case, your cron job solution makes sense. Dan's suggestion definitely works, but I think you can make it easier by using version numbers. Simply have another file with a version number in it, the cron job pulls that file, compares it and then performs an update if the version is newer. It's very similar to Dan's solution except you don't really need to hash anything. If you are updating GCS with your configurations, might as well tag on another file with the version information.
Another solution is to expose a handler in all those applications, for example an "/update" handler. Whenever it's hit, the application performs the update. You can hit that handler whenever you actually update the configuration in your GCS. This is more of a push solution. The advantage is that you have more control over which applications gets the updates, this might be useful if you aren't sure about a certain configuration yet so you don't want to update everything at once.
We did not want to add a handler in our application for this. We thought it was best to use supervisord.
additional-supervisord.conf:
[program:sync-configs]
command = /app/scripts/sync_configs.sh
startsecs = 0
autorestart = false
startretries = 1
sync_configs.sh:
#!/usr/bin/env bash
while true; do
# Sync configs from Google Storage.
gsutil -m rsync -c -r ${CONFIG_BUCKET} /app/config
# Reload PHP-FPM
ps ax | grep php-fpm | cut -f2 -d" " - | xargs kill -s USR2
# Wait 60 seconds.
sleep 60
done
Dockerfile:
COPY additional-supervisord.conf /etc/supervisor/conf.d/
Related
I have deployed a FastAPI ML service to Google App Engine but it's exhibiting some odd behavior. The FastAPI service is intended to receive requests from a main service (via Cloud Tasks) and then send responses back. And that does happen. But it appears the route in the FastAPI service that handles these requests gets called four times instead of just once.
My assumption was that GAE, gunicorn, or FastAPI would ensure that the handler runs once per cloud task. But it appears that multiple workers, or some other issue in my config, is causing the handler to get called four times. Here are a few more details and some specific questions:
The Fast API app is deployed to Google App Engine (flex) via gcloud app deploy app.yaml
The app.yaml file includes GUNICORN_ARGS: "--graceful-timeout 3540 --timeout 3600 -k gevent -c gunicorn.gcloud.conf.py main:app"
The Dockerfile in the FastAPI project root (which is used for the gcloud deploy) also includes the final command gunicorn -c gunicorn.gcloud.conf.py main:app
Here's the gunicorn conf:
bind = ":" + os.environ["PORT"]
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
forwarded_allow_ips = "*"
max_requests = 1000
max_requests_jitter = 100
timeout = 200
graceful_timeout = 6000
So I'm confused:
Does GUNICORN_ARGS in app.yaml or the gunicorn argument in the Dockerfile take precedence?
Should I be using multiple workers or is that precisely what's causing multiple tasks?
Happy to provide any other relevant info.
GAE Flex defines environment variables in the app.yaml file [1].
Looking at Docker Compose "In the case of environment, labels, volumes, and devices, Compose “merges” entries together with locally-defined values taking precedence." [2], depending on if they are using a .env file "Values in the shell take precedence over those specified in the .env file." [3]
[1] https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml#defining_environment_variables
[2] https://docs.docker.com/compose/extends/
[3] https://docs.docker.com/compose/environment-variables/
The issue is unlikely to be a Cloud Task duplication issue "in production, more than 99.999% of tasks are executed only once." [4]. You can investigate the calling source
[4] https://cloud.google.com/tasks/docs/common-pitfalls#duplicate_execution
You can also investigate the log contents to see if there are unique identifiers, or if they are the same logs.
For the second question on uvicorn [0] workers, you can try hard coding the value of “workers” to 1 and verify if there is no repetition.
[0] https://www.uvicorn.org/
Migrating from Standard to Flexible appengine and trying to get logs to show up the same way they did under Standard. Example:
That is one single log entry, expanded. Shows what the URL is at the top, the http method, status and latency of the request. When expanded like it is, it shows all logs that were created during the request, their level, and where in the code it is. This makes it very easy to see what all happened during a request.
Under Flexible, none of this seems to happen. Calling logging.info() creates its own distinct log entry in Logging, with no information about what request/route it was triggered under:
As you can see, each log entry (and in the case of a fatal error, tracebacks) get their own individual log entry per line. Some digging in their api and documentation I was able to get it to a point where I can at least group them together somewhat, but it's still not where it used to be.
I don't get severity level at the "group" level of the log, only when expanded (which means filtering by severity isn't possible) nor do I get what line the logging entry was called at. This also means a lot more individual log entries, and I don't even know how this will affect log exports.
To group the logs, I'm passing Pyramid a custom logging handler which is just google's AppEngineHandler but overriding get_gae_labels to provide it with the correct trace ID header (out of the box, it only supports django, flask, and webapp2)
def get_gae_labels(self):
"""Return the labels for GAE app.
If the trace ID can be detected, it will be included as a label.
Currently, no other labels are included.
:rtype: dict
:returns: Labels for GAE app.
"""
gae_labels = {}
request = pyramid.threadlocal.get_current_request()
header = request.headers.get('X-Cloud-Trace-Context')
if header:
gae_labels[_TRACE_ID_LABEL] = header.split("/", 1)[0]
return gae_labels
From what I can gather, appengine Flexible runs nginx in front of my application, and that passes stderr logs to Logging, and its own nginx_request logs. Then, when my application calls logging.info(), it matches up a trace ID to group them together. Because of this, a few things seem to be happening.
A. It doesn't show the highest severity level of related log entries
B. When you expand the log entry, the related log entries don't appear instantly like they do under appengine Standard, they take a second to load in as presumably Logging is looking for related logs via trace ID. Under Standard, appengine provides Logging with a line entry which has some meta data like the log message, line number, source code location etc, so it doesn't need to go look for related log entries, it's all there from the beginning. See below
I'm not sure of a solution here (hence the post) and I wonder if this ultimately would be solved by expanding google's Logging api. It seems to me that the solution really is to stop nginx from logging anything and let Pyramid handle logging exclusively, as well as allowing me to send up data within line so Logging doesn't have to try and group requests by trace ID.
Custom runtime under Flexible, adding this in the yaml file:
runtime_config:
python_version: 3.7
And the dockerfile:
FROM gcr.io/google-appengine/python
# Create a virtualenv for dependencies. This isolates these packages from
# system-level packages.
# Use -p python3 or -p python3.7 to select python version. Default is version 2.
RUN virtualenv /env -p python3
# Setting these environment variables are the same as running
# source /env/bin/activate.
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
# Copy the application's requirements.txt and run pip to install all
# dependencies into the virtualenv.
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt
# Add the application source code.
ADD . /app
# Run a WSGI server to serve the application. gunicorn must be declared as
# a dependency in requirements.txt.
RUN pip install -e .
CMD gunicorn -b :$PORT main:app
and requiresments.txt
pyramid
gunicorn
redis
google-cloud-tasks
googleapis-common-protos
google-cloud-ndb
google-cloud-logging
We're working on migrating our backend from appengine Standard (and the "webapp2" framework) to Flexible using pyramid. We have a proof of concept of sorts running seemingly without many issues. All it does in this early phase is take requests from a third party ("pings") and then go kick off a task to another internal service to go fetch some data. It connects with googles MemoryStore to cache a user-id to indicate that we've already fetched that users data (or attempted to) within the last 6 hours.
Speaking of 6 hours, it seems that every 6 hours or so, memory usage on the Flexible instance seems to reach a tipping point, and then probably flushes, and all is fine again. This instance is set to have 512MB of memory, yet like clockwork it does this at around 800MB (some kind of grace-usage? or maybe these can't be set to under 1GB)
It's clear by how gradual it's moving that memory isn't being cleared as often as maybe it should be.
When this happens, latency on the instance also spikes.
I'm not sure what's useful in debugging something like this so I'll try to show what I can.
Appengine YAML file:
runtime: custom
env: flex
service: ping
runtime_config:
python_version: 3.7
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
Dockerfile (as custom-runtime Flexible needs it)
FROM gcr.io/google-appengine/python
RUN virtualenv /env
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt
ADD . /app
RUN pip install -e .
CMD gunicorn -b :$PORT main:app
Why custom? I couldn't get this working under the default Python runtime. The pip install -e . was what appeared to be needed.
Then, in root __init__ I have:
from pyramid.config import Configurator
from externalping.memcache import CacheStore
cachestore = CacheStore()
def main(global_config, **settings):
""" This function returns a Pyramid WSGI application.
"""
with Configurator(settings=settings) as config:
config.include('.routes')
config.scan()
return config.make_wsgi_app()
Maybe having the connection to MemoryStore defined so early is the issue? Cachestore:
class CacheStore(object):
redis_host = os.environ.get('REDISHOST', 'localhost')
redis_port = int(os.environ.get('REDISPORT', 6379))
client = None
def __init__(self):
self.client = redis.StrictRedis(host=self.redis_host, port=self.redis_port)
def set_json(self, key, value):
self.client.set(key, json.dumps(value))
return True
def get_json(self, key):
return json.loads(self.client.get(key))
On the actual request itself, after importing from externalping import cachestore, I'm simply calling those methods shown above: cachestore.client.get(user['ownerId'])
This does appear to be how google's documentation says to implement this, as best I can tell. Only difference really is I put a wrapper around it.
I would like to run a SOLR Server on an Elastic Beanstalk. But I cannot find that much about that in the web.
It must be possible somehow, 'cause some are using it already. (https://forums.aws.amazon.com/thread.jspa?threadID=91276 i.e.)
Any Ideas how I could do that?
Well, somehow I can upload the solr warfile into the environment, but then it gets complicated.
Where do I put the config files and the index directory, so that each instance can reach it?
EDIT: Please keep in mind that this answer is from 2013. The products mentioned here have likely evolved. I have updated the documentation link to reflect changes in the solr clustering wiki. I encourage you to continue your research after reading this information.
ORIGINAL:
It only really makes sense to run solr on beanstalk instances if you are planning to only ever use the single server deploy. The minute that you want to scale your app you will need to configure your beanstalk environment to either create a solr cluster or move to something like CloudSearch. If you are unfamiliar with ec2 lifecycles and solr deployments then CloudSearch will almost certainly save you time (read money).
If you do want to run solr on a single instance then you can use rake to launch it by adding a file to your local repo named .ebextensions/solr.config with the following contents:
container_commands:
01create_post_dir:
command: "mkdir -p /opt/elasticbeanstalk/hooks/appdeploy/post"
ignoreErrors: true
02killjava:
command: "killall java"
test: "ps uax | grep java | grep root"
ignoreErrors: true
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/99_start_solr.sh":
mode: "755"
owner: "root"
group: "root"
content: |
#!/usr/bin/env bash
. /opt/elasticbeanstalk/support/envvars
cd $EB_CONFIG_APP_CURRENT
su -c "RAILS_ENV=production bundle exec rake sunspot:solr:start" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake db:seed" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake sunspot:reindex" $EB_CONFIG_APP_USER
Please keep in mind that this will cause chaos if you are using autoscaling.
I set up a cron job to download the logs using the following code:
echo [password] | appcfg.py request_logs --num_days=1 --severity=0 --append --passin --email=[user] --quiet /tmp/code/ ~/site_logs/`date +%m-%d-%y`.txt
I run it every 5 minutes, which is great for having the latest logs available to grep through. However, if I bump the version number, the log doesn't seem to honor append. It just writes over the file.
Am I missing something? Is there a better way to get this to continuously dump the logs to disk?
If you're still having this problem, you could expose a minimal API to expose the current version id and query that before fetching. It's hacky but it works.
The currently serving version ID can be retrieved using os.environ['CURRENT_VERSION_ID']. See: https://developers.google.com/appengine/docs/python/runtime