Can appengine files in asia.artifacts.../containers/images be safely deleted? - google-app-engine

Can appengine files in google cloud store under the bucket asia.artifacts.../containers/images be safely deleted without causing any problems. There is already 160Gb of them after just a few years. The documentation doesn’t make clear what they are for, or why they are retained there:
# gsutil du -sh gs://asia.artifacts.<project>.appspot.com
158.04 GiB gs://asia.artifacts.<project>.appspot.com
I just want to know if I can delete them, or if I need to keep paying for the storage space.
Originally I thought these files might correspond to what can be seen on the "Google Cloud Platform" "Container Registry" "Images" "app-engine-tmp". But even if you delete almost everything under the container registry web interface, there are still thousands of really old files sit-in in this containers/images folder.
If I had to guess the reason for this ever growing pile of probably junk files. I suspect if versions are deleted through the web interface, the underlying files are not removed. Is that correct?
UPDATE: I did find this clue in the cloud build logs that occur when you deploy. I tested out deleting the artifacts bucket on a test project. The project still works, and builds still works. An apparently harmless error message appears in the logs. Perhaps its genuinely safe to delete this artefacts folder. However, it'd be good to have clarity on what these ancient (apparently unused) artefact bucket files are for before deleting.
2021/01/15 11:27:40 Copying from asia.gcr.io/<project>/app-engine-tmp/build-cache/ttl-7d/default/buildpack-cache:latest to asia.gcr.io/sis-au/app-engine-tmp/build-cache/ttl-7d/default/buildpack-cache:f650fd29-3e4e-4448-a388-c19b1d1b8e04
2021/01/15 11:27:42 failed to copy image: GET https://storage.googleapis.com/asia.artifacts.<project>.appspot.com/containers/images/sha256:ca16b83ba5519122d24ee7d343f1f717f8b90c3152d539800dafa05b7fcc20e9?access_token=REDACTED: unsupported status code 404; body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: asia.artifacts.<project>.appspot.com/containers/images/sha256:ca16b83ba5519122d24ee7d343f1f717f8b90c3152d539800dafa05b7fcc20e9</Details></Error>
Unable to tag previous cache image. This is expected for new or infrequent deployments.

It should be safe to delete those. According to Google docs:
Each time you deploy a new version, a container image is created using the Cloud Build service. That container image then runs in the App Engine standard environment.
Built container images are stored in the app-engine folder in Container Registry. You can download these images to keep or run elsewhere. Once deployment is complete, App Engine no longer needs the container images. Note that they are not automatically deleted, so to avoid reaching your storage quota, you can safely delete any images you don't need.
Also as a suggestion, if you don't want to manually delete the images just in case they start piling up again, you can set up Lifecycle Management on your "artifacts" bucket and add a rule to delete old files (for example, 30 days).
This thread is similar to your concern and they have great answers. Feel fee to check it out!
IMPORTANT UPDATE: This answer only applies on Standard environment. The artifacts bucket is used as the backing storage for Flex apps images. It's used when bringing up and autoscaling VMs, so be careful when you consider deleting them.

Related

Appengine runs a stale version of the code -- and stack traces don't match source code

I have a python27 appengine application. My application generates a 500 error early in the code initialization, and I can inspect the stack trace in the StackDriver debugger in the GCP console.
I've since patched the code, and I've re-deployed under the same service name and version name (i.e. gcloud app deploy --version=SAME). Unfortunately, the old error still comes up, and line numbers in the stack traces reflect the files in the buggy deployment. If I use the code viewer to debug the error, I am however brought to the updated patched code in the online viewer -- and there is a mismatch. It behave as if the app instance is holding on to a previous snapshot of the code.
I'm fuzzy on the freshness and eventual consistency guarantees of GAE. Do I have to wait to get everything to serve the latest deployed version? Can I force it to use the newer code right away?
Things I've tried:
I initially assumed the problem had to do with versioning, i.e. maybe requests being load-balanced between instances with the same version, but each with slightly different code. I'm a bit fuzzy on the actual rules that govern which GAE instance gets chosen for a new request (esp whether GAE tries to reuse previous instances based on a source IP). I'm also fuzzy on whether or not active instances get destroyed right away when different code is redeployed under the same version name.
To take that possibility out of the equation, I tried pushing to a new version name, and then deleting all previous versions (using gcloud app versions list to get the list). But it doesn't help -- I still get stack traces from the old code, despite the source being up to date in the GCP console debugger. Waiting a couple hours doesn't do anything either.
I've tried two things:
disabling and re-enabling the application in GAE->Settings
I'd also noticed that there were some .pyc files uploaded in the snapshot, so I removed those and re-deployed.
I discovered that (1) is a very effective way to stop all running appengine instances. When you deploy a new version of a project, a traffic split is created (i.e. 0% for the old version and 100% for the new), but in my experience old instances might still be running if they've been used recently (despite them being configured to receive 0% of traffic). Toggling kills them all immediately. I unfortunately found that my stale code was still being used after re-enabling.
(2) did the trick. It wasn't obvious that .pyc were being uploaded. I discovered it by looking at GCP->StackDriver->Debug and I saw .pyc files in the tree snapshot.
I had recently updated my .gitignore to ignore locally installed pip runtime dependencies for the project (output of pip install -t lib requirements.txt). I don't want those in git, but they do need to ship as part of my appengine project. I had removed the #!.gitignore special include line from .gcloudignore. However, I forgot to re-add *.pyc into my .gcloudignore.
Another way to see the complete set of files included in an app deployment is to increase the verbosity to info on the gcloud app deploy command -- you see a giant json manifest with checksums. I don't typically leave that on because it's hard to visually inspect, but I would have spotted the .pyc in there.

CDN caching strategy for React websites that use chunks

What's the best caching strategy for React websites that use chunks (code splitting)?
Before using chunks I would just cache everything for one year in CloudFront, and I would delete the old files and invalidate the cache after a new version of the website was deployed. Works great.
However, after I started using chunks I started getting problems. One common issue is that after deploying a new version of the site I delete the old files and invalidate the cache. One user is already active on the old version of the site and his version of the site tries to load a chunk that no longer exists, so the site crashes for him.
One potential solution would be to keep all old files for a month or longer, and delete all files that are older than X months during the deployment process.
Is there any better solution to this problem. Am I missing something special from the service worker that CRA (Create React App) provides? If I remember correctly it provides some kind of cache busting.
Thanks.

GAE: What's faster loading an include config file from GCS or from cloud SQL

Based on the subdomain that is accessing my application I need to include a different configuration file that sets some variables used throughout the application (the file is included on every page). I'm in two minds about how to do this
1) Include the file from GCS
2) Store the information in a table on Google Cloud SQL and query the database on every page through an included file.
Or am I better off using one of these options and then Memcache.
I've been looking everywhere for what is the fastest option (loading from GCS or selecting from cloud SQL), but haven't been able to find anything.
NB: I don't want to have the files as normal php includes as I don't want to have to redeploy the app every time I setup a new subdomain (different users get different subdomains) and would rather either just update the database or upload a new config file to cloud storage, leaving the app alone.
I would say the most sane solution would be to store the configuration files in the Cloud SQL as you can easily make changes to them even from within the app and using the memcache since it was build exactly for this kind of stuff.
The problem with the GCS is that you cannot simply edit the file and you will have to delete and add a new version every time which is not going to be optimal in a long run.
GCS is cheaper, although for small text files it does not matter much. Otherwise, I don't see much of a difference.

My GAE python development datastore is never persisted to a file

I have just started using GAE (Python 2.7 SDK 1.6.4) , I have set up a
simple test project using Pydev (latest version) in eclipse (indigo)
on Windows XP (SP3).
It all works fine, my app can record data in the datastore and the blobstore
and then retrieve it, but when I stop the development server and start
it again the data in the datastore is lost. This is not the case for
the blobstore which is retaining blobs fine and I can see the
blobstore folder that gets created in C:\Temp
I did the sensible thing and look back through old posts and found
that most people who have this problem solve it by changing the
location of the datastore file, so I used the following parameters;
--datastore_path="${workspace_loc}/myproject/datastore"
--blobstore_path="${workspace_loc}/myproject/blobstore"
"${workspace_loc}/myproject/src"
I moved the blobstore at the same time as you can see.
The blobstore still works, and now the blobstore folder is created in
myproject folder as expected. The datastore file is still not created
however, and when I stop and restart the development server the data
is still lost.
The dev server startup logs include the following entry
WARNING 2012-04-20 10:49:04,513 datastore_file_stub.py:513] Could not
read datastore data from C:\myworkspace\myproject\datastore
So I know it is trying to create the datastore in the correct place.
Finally I lifted the whole eclipse workspace folder and copied it to
another computer with exactly the same setup except it is running
Windows 7 instead of Windows XP.
Everything works fine there - both the datastore file and blobstore
folder are now created where I expect them to be.
I have set up eclipse, python, gae, my project and my eclipse launch
file in exactly the same way on two computers, it works on one and
not the other. Maybe XP is something to do with it but to be honest I
think that's unlikely.
The only other clue I have come up with is that a recent change to the
GAE development server stopped writing to the datastore file after
every change and only flushes on exit, this problem may be closely related to mine;
App Engine local datastore content does not persist
However adding the following to my code did not help at all.
from google.appengine.tools import dev_appserver
import atexit
atexit.register(dev_appserver.TearDownStubs)
So it's not down to incorrect termination sequence either as far as I
can tell although it may be that I was just added it in the wrong place (I'm am new to python).
Anyway I am stumped and I would be really grateful for suggestions you
guys can come up with.
It's probably http://code.google.com/p/googleappengine/issues/detail?id=7244 and a bug. Hopefully a fix will be available soon.
did you try:
--storage_path=...
Path at which all local files (such as the Datastore, Blobstore files, Google Cloud Storage Files, logs, etc) will be stored, unless overridden by --datastore_path, --blobstore_path, --logs_path, etc.
found at https://developers.google.com/appengine/docs/python/tools/devserver?csw=1

Where can I find out about versioned URL's in Google App Engine?

Versioned URL's are mentioned near the bottom of this section with very little explanation:
http://code.google.com/appengine/docs/python/config/appconfig.html#Secure_URLs
I want to find out more about versioned URL's, but I can't seem to find any more information. My main concern has to do with disabling them. I can see how this could be useful (e.g. for debugging or postmortem), but I don't want users to be able to run previous versions of my app! Those versions might even end up corrupting data, because of data model evolution.
Is there any way to configure versioned URL's in Google App Engine?
On the deployment control page for your application, you can delete prior versions under Administration ➤ Versions if you want.
The idea behind simultaneously running deployment of old versions is to not break things for earlier users who depend on old versions. By choosing the default deployment, you make users of your_app.appspot.com not care. If you want to prevent someone from running deployment 1.your_app.appspot when version 2 is up, just delete deployment 1.

Resources