Google app engine - how to disable the cache - google-app-engine

So some context:
I have a nodeJS api running on a google app engine. All my get requests are being cached by default by the app engine for 10 minutes.
I am using cloudflare for my API as this allows me to remove specific items from the cache when needed.
You can imagine this has caused a bit of an issue because my CF cache was correctly cleared but the app engine kept returning old data.
According to the docs, you can set a default_expiration in the app.yaml file but setting this to 0 or 0s has made no difference and google keeps caching my responses.
Seemingly, there is also no way you can get something uncached from google.
Now my obvious question here is: is there some way I can completely ignore this cache? Preferrably without having to set my entire API's response to private , 0s cache.
It quite irks me that google is forcing this cache on me provides very vague documentation on the whole matter.

You can configure your app.yaml to define a cache period.
If you use default_expiration this will set a global default cache period for all static file handlers for an application. If omitted, the production server sets the expiration to 10 minutes by default.
To set specific expiration times for individual handlers, specify the expiration element within the handler element in your app.yaml file. You can change the duration of time web proxies and browsers should cache a static file served by this handler.
default_expiration: "4d 5h"
handlers:
- url: /stylesheets
static_dir: stylesheets
expiration: "0d 0h"

Seems like you are referring to the static cache (per your link). Try cache bursting techniques such as adding a query parameter to the url e.g.
https://<url>/?{{APP_VERSION_ID}}
where APP_VERSION_ID is the latest version of your deployed App. This way, each time you redeploy your App, the APP_VERSION_ID is changed and the latest version of your static files will always be loaded

Related

How solve High latency in app engine caused by "This request caused a new process to be started for your application..."?

App working with standard environment app engine, python 3.7 and cloud sql (Mysql)
Checking the logs there are some with very high latencies (more than 4 seconds), when the expected are 800ms. All these logs are accompanied by this message:
"This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application."
I understand that when it refers to a new process it refers to the deployment of a new instance (since I use automatic scaling) however the strange thing is that when comparing these logs with the deployment of instances in some cases it matches but in others it does not.
My question is, how can these latencies be reduced?
The app engine config is:
runtime: python37
env: standard
instance_class: F1
handlers:
- url: /static/(.*)
static_files: static/\1
require_matching_file: false
upload: static/.*
- url: /.*
script: auto
secure: always
- url: .*
script: auto
automatic_scaling:
min_idle_instances: automatic
max_idle_instances: automatic
min_pending_latency: automatic
max_pending_latency: automatic
network: {}
As you note, these slower requests happen whenever app engine needs to start a new instance for your application, as the initial load is slow (these are called "loading requests").
However, App Engine does provide a way to use "warmup" requests -- basically, dummy requests to your application to start instances in advance of when they are actually needed. This can reduce, but not eliminate the user-affecting loading requests.
This can slightly increase your costs, but it should reduce the loading request latency as these dummy requests will be the ones that eat the cost of starting a new instance.
In the python 3.7 runtime, you can add a "warmup" element to the inbound_services directive in app.yaml:
inbound_services:
- warmup
This will send a request to /_ah/warmup where, if you want, you can do any other initialization the instance needs (e.g. starting a DB connection pool).
There are more strategies that may help you decrease your latencies in your application.
You can modify your automatic_scaling options in order to use something that may suit better for your app.
You can manage better your bandwidth by setting the appropriate Cache-Control header on your responses and set reasonable expiration times for static files.
Using public Cache-Control headers in this way will allow proxy servers and your clients' browser to cache responses for the designated period of time.
You can use bigger instance class like F2 in order to avoid horizontal scaling happening so often. As I understood from this issue, your latencies increase mostly while new instances are deployed.
You can, also enable concurrent requests and write your code as asynchronously as you can.

Change to static file doesn't happen immediately after deploy

When I change a static file (here page.html), and then run appcfg.py update, even after deployment is successful and it says the new files are serving, if I curl for the file the change has not actually taken place.
Relevant excerpt from my app.yaml:
default_expiration: "10d"
- url: /
static_files: static/page.html
upload: static/page.html
secure: always
Google's docs say "Static cache expiration - Unless told otherwise, web proxies and browsers retain files they load from a website for a limited period of time." There shouldn't be any browser cache as I am using curl to get the file, and I don't have a proxy set up at home at least.
Possible hints at the answer
Interestingly, if I curl for /static/page.html directly, it has updated, but if I curl for / which should point to the same file, it has not.
Also if I add some dummy GET arg, such as /?foo, then I can also see the updated version. I also tried adding the -H "Cache-Control: no-cache" option to my curl command, but I still got the stale version.
How do I see updates to / immediately after deploy?
As pointed out by Omair, the docs for the standard environment for Pyhton state that "files are likely to be cached by the user's browser, as well as by intermediate caching proxy servers such as Internet Service Providers". But I've found a way to flush static files cached by your app on Google Cloud.
Head to your Google Cloud Console and open your project. Under the left hamburger menu, head to Storage -> Browser. There you should find at least one Bucket: your-project-name.appspot.com. Under the Lifecycle column, click on the link with respect to your-project-name.appspot.com. Delete any existing rules, since they may conflict with the one you will create now.
Create a new rule by clicking on the 'Add rule' button. For the object conditions, choose only the 'Newer version' option and set it to 1. Don't forget to click on the 'Continue' button. For the action, select 'Delete' and click on the 'Continue' button. Save your new rule.
This new rule will take up to 24 hours to take effect, but at least for my project it took only a few minutes. Once it is up and running, the version of the files being served by your app under your-project-name.appspot.com will always be the latest deployed, solving the problem. Also, if you are routinely editing your static files, you should remove any expiration element from handlers related to those static files and the default_expiration element from the app.yaml file, which will help avoid unintended caching by other servers.
According to App Engine's documentation on static cache expiration, this could be due to caching servers between you and your application respecting the caching headers on the responses:
The expiration time will be sent in the Cache-Control and Expires HTTP response headers, and therefore, the files are likely to be cached by the user's browser, as well as by intermediate caching proxy servers such as Internet Service Providers.
Once a file is transmitted with a given cache expiration time, there is generally no way to clear it out of intermediate caches, even if you clear the browser cache or use Curl command with no-cache option. Re-deploying a new version of the app will not reset caches as well.
For files that needs to be modified, shorter expire times are recommended.

Why does GAE skip changed files when deploying?

Often when I make a local change to a .js or .css file and then I deploy the app the files are skipped. What's going on?
For example, let's say I edit:
public_html/www/account/dashboard/dashboard.css
When I deploy I see this in my log:
`Skipping upload of [public_html/www/account/dashboard/dashboard.css]
Here is the skip_files rule in my app.yaml
skip_files:
- ^(.*/)?#.*#$
- ^(.*/)?.*~$
- ^(.*/)?.*\.py[co]$
- ^(.*/)?.*/RCS/.*$
- ^(.*/)?\..*$
- ^.*node_modules(/.*)?
- ^data/.*$
- ^public_html/data/.*$
And not sure if this is related but here is a static_files rule for making my css application_readable:
- url: /(.*\.(gif|png|jpg|jpeg|js|html|css|json|tpl))$
static_files: public_html/www/\1
upload: public_html/www/.*\.(gif|png|jpg|js|html|css|json|tpl)$
application_readable: true
Finally found an answer in the docs. It has to do with static file cache expiration. It sounds like there is no way to immediately clear out static file caches. The best thing to do is to set the default_expiration to a short time period, mine was set to 7 days.
Here is a link to the docs:
https://cloud.google.com/appengine/docs/standard/python/config/appref#static_cache_expiration
Here is an explanation from the docs
The expiration time will be sent in the Cache-Control and Expires HTTP
response headers, and therefore, the files are likely to be cached by
the user's browser, as well as by intermediate caching proxy servers
such as Internet Service Providers. After a file is transmitted with a
given expiration time, there is generally no way to clear it out of
intermediate caches, even if the user clears their own browser cache.
Re-deploying a new version of the app will not reset any caches.
Therefore, if you ever plan to modify a static file, it should have a
short (less than one hour) expiration time. In most cases, the default
10-minute expiration time is appropriate.

App Engine: Static Files Not Updating on Deploy

I pushed an HTML static file containing an Angular SPA as catch-all handler for my custom domain with this settings:
- url: /(api|activate|associate|c|close_fb|combine|import|password|sitemap)($|/.*)
script: gae.php
- url: /.*
static_files: public/static/app/v248/es/app.html
upload: public/static/app/v248/es/app.html
expiration: "1h"
That worked fine, but if I push a new app.html it doesn't update. I've tried to change the local path, deploy a new app version, even replacing the catch-all handler with a custom php endpoint, but it doesn't work, the response still is the first version of app.html I uploaded.
Other people has had the same problem (CSS File Not Updating on Deploy (Google AppEngine)), and it looks like is related to Google CDN cache but, as far as I know, there isn't any way to flush it.
There is a way to flush static files cached by your app on Google Cloud.
Head to your Google Cloud Console and open your project. Under the left hamburger menu, head to Storage -> Cloud Storage -> Browser. There you should find at least one Bucket: your-project-name.appspot.com. Under the Lifecycle column, click on the link with respect to your-project-name.appspot.com. Delete any existing rules, since they may conflict with the one you will create now.
Create a new rule by clicking on the 'Add A Rule' button. For the action, select "Set storage to nearline". For the object conditions, choose only the 'Number of newer versions' option and set it to 1. Click on the 'Continue' button and then click 'Create'.
This new rule will take up to 24 hours to take effect, but at least for my project it took only a few minutes. Once it is up and running, the version of the files being served by your app under your-project-name.appspot.com will always be the latest deployed, solving the problem. Also, if you are routinely editing your static files, you should remove any expiration element from handlers related to those static files and the default_expiration element from the app.yaml file, which will help avoid unintended caching by other servers.
When performing changes in static files in an App Engine application, changes will not be available immediately, due to cache, as you already imagined. The cache in Google Cloud cannot be manually flushed, so instead I would recommend you to change the expiration time to a shorter period (by default it is 10 minutes) if you want to test how it works, and later setting an appropriate expiration time according to your requirements.
Bear in mind that you can change the static cache expiration time both for all static files or for just the ones you choose, just by setting the proper element in the app.yaml file.
2020 Update:
For my application I found that App Engine started failing to detect my latest app deployments once I reached 50 Versions in my Versions list.
See (Burger Menu) -> App Engine -> Versions
After deleting a bunch of old versions on next deploy it picked up my latest changes immediately. Not sure if this is specific to my account or billing settings but that solved it for me.
I had my static files over a service in Google Cloud Platform. My problem was that I didn't execute
gcloud app deploy dispatch.yaml
Once executed, everything was fine. I hope it helps
Another problem that could be causing this is caching in Google's frontend, which depends on the cache header returned by your application. In my case, I opened Firefox's inspector on the Network tab, and saw that the stale file had a cache-control setting of 43200 seconds, i.e. 12 hours:
This was for a file returned from Flask, so I fixed this by explicitly specifying max-age in the Flask response from my GAE code:
return flask.send_from_directory(directory, filename, max_age=600)
This causes intermediate caches such as Google's frontend to only cache the file for a period of 600 seconds (10 minutes).
Unfortunately, once a file has been caches there is no way to flush it, so you will have to wait out the 12 hours. But it will solve the problem for the next time.

Does changing Cloudfront Download Distribution Origin Path result in a cache invalidation?

I am working on a solution to get S3 and Cloudfront in sync when I upload a new version of an angular app.
My approach is to upload the new version to a new folder with an increasing version number http://awsbucket/v1 ... /v2 and after that updating the Download Distribution Origin Path to that new folder.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesOriginPath
I am wondering if this change of the Origin Path automatically results in a complete cache invalidation or if i have to send invalidation requests never the less.
So if you keep moving your web resources ( images, scripts or any thing that can be sent over http) to various versions and do to necessary changes in your app; by design; intentionally you would starting using the newer versions resources - the older version's cache would go colder and colder and eventually being taken out of the cache.
The invalidation requests are costly, time consuming while the versioning is easy and natural. The best use cases was found in the areas of newer CSS stylesheets, updation in js scripts being versioned. The same can be extrapolated for your use case.
Also you don't need to change the origin; keep adding the new files to the S3 and ensure the same are being reflected in the app- that would do.
To answer your question, NO - changing the Origin, including just the path, does not result in cache invalidation.
Information can be found here
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesDomainName
Quoting the specific part:
Changing the origin does not require CloudFront to repopulate edge caches with objects from the new origin. As long as the viewer requests in your application have not changed, CloudFront will continue to serve objects that are already in an edge cache until the TTL on each object expires or until seldom-requested objects are evicted.

Resources