App Engine Flexible running out of disk space - google-app-engine

Stopping and deleting old versions and instances in my project doesn't seem to free up disk space. After stopping and deleting a working instance and then spinning up a new instance I get error messages related to disk space (health_check returns unhealthy, i get logs of vm_check_disk_space.sh). I know this is related to disk space as I can resolve the issue by raising resources: disk_size_gb in my app.yaml and redeploying.
My project is 15gb so it's essential that deleted versions and instances don't bloat my project. How can I go about freeing up unused space?
For reference this is my app.yaml (and with a project size of 15gb this should be more than enough?)
runtime: custom
env: flex
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 1.5
disk_size_gb: 40

The docker image used for a specific version is built at deployment time and doesn't normally include other versions of your app (unless they are also present in your deployment directory). So stopping instances for or deleting other versions in the developer console has no impact on the already built docker image.
Increase the deployment verbosity (see --verbosity in gcloud) to see what exactly is included in the image being built then re-deploy while looking for unwanted files/directories. Then use the skip_files configuration option in app.yaml (see General settings) to skip them, if any. A typical such example would be the app's .git directory, for example. Repeat until you're happy with what's included in the docker image.
If you still encounter the problem after skipping unwanted files it could mean that your custom runtime is simply too big for the app's disk size configuration, so you'll have to increase it.
Note that the disk may be used for storing data generated at runtime as well, not only for storing your app and environment code, so you may need to investigate runtime usage as well, see Debugging an Instance.

Related

In the twelve factor app method, what does it mean to combine the build with the deploy's config?

I'm finding inspiration from the twelve-factor app approach to organize the deployment process of small applications. I'm stuck on the release stage on guidelines that sound contradictory.
The twelve factor app says that the config should not be stored in files but in the environment, like environment variables. (I imagine that files sitting somewhere on the host can also serve as "config stored in the environment", such as a ssh private key in .ssh/private_key that will give access to some protected resource through ssh.)
I thus imagine just setting up my various hosts manually by setting those environment variables by hand (or in .bashrc or similar so I don't have to do it again every time they reboot). I only usually have 2 hosts: my laptop for development and a server for showing my work to others. If I had more hosts, I could think of a way to automate this, but this is out of the scope of my question.
Then the twelve factor app guidelines define the release stage as producing a release that contains both the build and the config. This could simply mean sending your build (for example docker images of your app) to the target host. The built app and the target host configuration being at the same place (on the same host), they are de facto combined.
I don't however have any way to uniquely identify a release or have the possibility to rollback. In order to do that, I would have to store the config with the build somewhere so that I can get back to them if I need to. That's where I'm stuck: I can't figure out how one approaches this in practice.
What sounds contradictory is that config should be read from the environment and the possibility to rollback to a previous release, which implies a previous config.
Perhaps the following workflow would be an answer, but maybe convoluted:
send the build to the host,
read the host config (environment variables, etc.) and copy them to make a snapshot of this host's config at that moment,
store both the build and the config copy in a uniquely identified place
Such that when you want to run a particular release on a given host, you :
apply that release config to the host environment
run the build which will read the config from the environment
The step of making a snapshot of the environment's config to apply it again seem somewhat convoluted and I'd like to know if there is a more sensible way to think about the release stage.

Google App Engine (GAE) - do instances use a shared disk?

I'm using Google App Engine (GAE) and my app.yaml looks like this:
runtime: custom # uses Dockerfile
env: flex
manual_scaling:
instances: 2
resources:
cpu: 2
memory_gb: 12
disk_size_gb: 50
The 50GB disk, is it shared between instances? The docs are silent on this issue. I'm downloading files to the disk and each instance will need to be able to access the files I am downloading.
If the disk is not shared, how can I share files between instances?
I know I could download them from Google Cloud storage on demand, but these are video files and instant access is needed for every instance. Downloading the video files on demand would be too slow.
Optional Reading
The reason instant access is needed is because I am using ffmpeg to produce a photo from the video at frame X (or time X). When a photo of the video is taken, these photos need to be made available to the user as quickly as possible.
You are using 50 Gb disk in your GAE, be it standard or flex, there is no way you can share instances between GAE, as the storage is dedicated.
You tried GCS, since video file processing is involved and GCS is object based storage.
So the alternative to this could be Filestore but it is not yet supported for GAE Flex despite the possibility of SSH into its underlying fully-managed machine.
There is a way if you use the /tmp folder. However, it will store files in the RAM of the instance, so note that it will take up memory and that it is temporary (as the folder's name suggests).
For more details, see the documentation here or here.

Can appengine files in asia.artifacts.../containers/images be safely deleted?

Can appengine files in google cloud store under the bucket asia.artifacts.../containers/images be safely deleted without causing any problems. There is already 160Gb of them after just a few years. The documentation doesn’t make clear what they are for, or why they are retained there:
# gsutil du -sh gs://asia.artifacts.<project>.appspot.com
158.04 GiB gs://asia.artifacts.<project>.appspot.com
I just want to know if I can delete them, or if I need to keep paying for the storage space.
Originally I thought these files might correspond to what can be seen on the "Google Cloud Platform" "Container Registry" "Images" "app-engine-tmp". But even if you delete almost everything under the container registry web interface, there are still thousands of really old files sit-in in this containers/images folder.
If I had to guess the reason for this ever growing pile of probably junk files. I suspect if versions are deleted through the web interface, the underlying files are not removed. Is that correct?
UPDATE: I did find this clue in the cloud build logs that occur when you deploy. I tested out deleting the artifacts bucket on a test project. The project still works, and builds still works. An apparently harmless error message appears in the logs. Perhaps its genuinely safe to delete this artefacts folder. However, it'd be good to have clarity on what these ancient (apparently unused) artefact bucket files are for before deleting.
2021/01/15 11:27:40 Copying from asia.gcr.io/<project>/app-engine-tmp/build-cache/ttl-7d/default/buildpack-cache:latest to asia.gcr.io/sis-au/app-engine-tmp/build-cache/ttl-7d/default/buildpack-cache:f650fd29-3e4e-4448-a388-c19b1d1b8e04
2021/01/15 11:27:42 failed to copy image: GET https://storage.googleapis.com/asia.artifacts.<project>.appspot.com/containers/images/sha256:ca16b83ba5519122d24ee7d343f1f717f8b90c3152d539800dafa05b7fcc20e9?access_token=REDACTED: unsupported status code 404; body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: asia.artifacts.<project>.appspot.com/containers/images/sha256:ca16b83ba5519122d24ee7d343f1f717f8b90c3152d539800dafa05b7fcc20e9</Details></Error>
Unable to tag previous cache image. This is expected for new or infrequent deployments.
It should be safe to delete those. According to Google docs:
Each time you deploy a new version, a container image is created using the Cloud Build service. That container image then runs in the App Engine standard environment.
Built container images are stored in the app-engine folder in Container Registry. You can download these images to keep or run elsewhere. Once deployment is complete, App Engine no longer needs the container images. Note that they are not automatically deleted, so to avoid reaching your storage quota, you can safely delete any images you don't need.
Also as a suggestion, if you don't want to manually delete the images just in case they start piling up again, you can set up Lifecycle Management on your "artifacts" bucket and add a rule to delete old files (for example, 30 days).
This thread is similar to your concern and they have great answers. Feel fee to check it out!
IMPORTANT UPDATE: This answer only applies on Standard environment. The artifacts bucket is used as the backing storage for Flex apps images. It's used when bringing up and autoscaling VMs, so be careful when you consider deleting them.

Appengine runs a stale version of the code -- and stack traces don't match source code

I have a python27 appengine application. My application generates a 500 error early in the code initialization, and I can inspect the stack trace in the StackDriver debugger in the GCP console.
I've since patched the code, and I've re-deployed under the same service name and version name (i.e. gcloud app deploy --version=SAME). Unfortunately, the old error still comes up, and line numbers in the stack traces reflect the files in the buggy deployment. If I use the code viewer to debug the error, I am however brought to the updated patched code in the online viewer -- and there is a mismatch. It behave as if the app instance is holding on to a previous snapshot of the code.
I'm fuzzy on the freshness and eventual consistency guarantees of GAE. Do I have to wait to get everything to serve the latest deployed version? Can I force it to use the newer code right away?
Things I've tried:
I initially assumed the problem had to do with versioning, i.e. maybe requests being load-balanced between instances with the same version, but each with slightly different code. I'm a bit fuzzy on the actual rules that govern which GAE instance gets chosen for a new request (esp whether GAE tries to reuse previous instances based on a source IP). I'm also fuzzy on whether or not active instances get destroyed right away when different code is redeployed under the same version name.
To take that possibility out of the equation, I tried pushing to a new version name, and then deleting all previous versions (using gcloud app versions list to get the list). But it doesn't help -- I still get stack traces from the old code, despite the source being up to date in the GCP console debugger. Waiting a couple hours doesn't do anything either.
I've tried two things:
disabling and re-enabling the application in GAE->Settings
I'd also noticed that there were some .pyc files uploaded in the snapshot, so I removed those and re-deployed.
I discovered that (1) is a very effective way to stop all running appengine instances. When you deploy a new version of a project, a traffic split is created (i.e. 0% for the old version and 100% for the new), but in my experience old instances might still be running if they've been used recently (despite them being configured to receive 0% of traffic). Toggling kills them all immediately. I unfortunately found that my stale code was still being used after re-enabling.
(2) did the trick. It wasn't obvious that .pyc were being uploaded. I discovered it by looking at GCP->StackDriver->Debug and I saw .pyc files in the tree snapshot.
I had recently updated my .gitignore to ignore locally installed pip runtime dependencies for the project (output of pip install -t lib requirements.txt). I don't want those in git, but they do need to ship as part of my appengine project. I had removed the #!.gitignore special include line from .gcloudignore. However, I forgot to re-add *.pyc into my .gcloudignore.
Another way to see the complete set of files included in an app deployment is to increase the verbosity to info on the gcloud app deploy command -- you see a giant json manifest with checksums. I don't typically leave that on because it's hard to visually inspect, but I would have spotted the .pyc in there.

gcloud preview app deploy process takes ~8 minutes, is this normal?

Trying out new flexible app engine runtime. In this case a custom Ruby on Rails runtime based on the google provided ruby runtime.
When firing of gcloud preview app deploy the whole process takes ~8 minutes, most of which is "updating service". Is this normal? And more importantly, how can I speed it up?
Regards,
Ward
Yes, that is totally normal. Most of the deployment steps happen away from your computer and are independent of your codebase size, so there's not a lot you can do to speed up the process.
Various steps that are involved in deploying an app on App Engine can be categorized as follows:
Gather info from app.yaml to understand overall deployment
Collect code and use the docker image specified in app.yaml to build a docker image with your code
Provision Compute Instances, networking/firewall rules, install docker related tools on instance, push docker image to instance and start it
Make sure all deployments were successful, start health-checks and if required, transfer/balance out the load.
The only process which takes most of time is the last part where it does all the necessary checks to make sure deployment was successful and start ingesting traffic. Depending upon your code size (uploading code to create container) and requirements for resources (provisioning custom resources), step 2 and 3 might take a bit more time.
If you do an analysis you will find that about 70% of time is consumed in last step, where we have least visibility into, yet the essential process which gives app-engine the ability to do all the heavy lifting.
Deploying to the same version got me from 6 minutes to 3 minutes in subsequent deploys.
Example:
$ gcloud app deploy app.yaml --version=test
Make sure you check what is in the zip it's uploading (it tells you the location of this on deploy), and make sure your yaml skip_files is set to include things like your .git directory if you have one, and node_modules
Note that the subsequent deploys should be much faster than 8 mins. It's usually 1 minute or less in my tests with Node.js on App Engine Flex.
As suggested above by #ludo you could use in the meantime Google App Engine Standard instead of Flex. Which, takes approximately ~30-50 seconds after the first deployment.
You can test GAE Standard by running this tutorial, which doesn't require a billing account:
https://codelabs.developers.google.com/codelabs/cloud-app-engine-springboot/index.html#0
And agreed. this doesn't address GAE Flex but gives some options to accelerate during development.
Just fire this command from root directory of app.yaml
From shell visit directory of app.yaml then run gcloud app deploy
It will be uploaded within few seconds.

Resources