AppEngine: forwarding request to static-content - google-app-engine

In order to optimize caching, I added timestamps to the src values of images, etc, e.g. <img src="/static/img/foo.1456871418309.png"/>.
Then I created a servlet filter which removes the timestamps again, and then forwards the request to the resource. request.getRequestDispatcher (urlWithoutTimestamp).forward(request, response).
And in appengine-web.xml I specified to cache files a year.
<static-files>
<include path="/static/**" expiration="365d" />
</static-files>
This way, I thought, the user would always get the latest version and can cache it until a newer version exists, without having to test it via HTTP.
On localhost (Run As Web Application) this works fine, the HTTP response has the expected caching headers:
Cache-Control:"public, max-age=31536000"
Expires:"Wed, 01 Mar 2017 22:54:32 GMT"
However when I deploy the whole thing on the app engine server the response has this caching header (and no “Expires”):
Cache-Control:"private"
Instead of forwarding the request, I also tried calling the FilterChain on a HttpServletRequestWrapper to modify the getRequestURI, getRequestURL, getServletPath. This lead to the same poor result.
How do I get this right?
Update:
On second thought, I guess forwarding to static-content may not be possible, because forwarding is always done server locally and the static-content may well be served on another machine.
But at least I was able to solve my HTTP header problem with my filter:
Instead of forwarding the request, I use a HttpServletRequestWrapper (see above) to remove the timestamp from the request.
Then I call chain.doFilter on the modified request.
Last I set the caching headers on the response.

Instead of changing filename and using a filter you can reference actual static file with timestamp added as request parameter.
Instead of:
<img src="/static/img/foo.1456871418309.png"/>
use following schema:
<img src="/static/img/foo.png?r=1456871418309"/>
For static file such parameters are ignored. But for browser it will be a new URL each time, therefore it will be requested from the server.

In production, you will not be able to intercept static file traffic. Google App Engine takes care of static files for you, basically providing a sort of CDN.
I quote the note on https://cloud.google.com/appengine/docs/java/config/webxml#Filters
Filters are not invoked on static assets, even if the path matches a
filter-mapping pattern. Static files are served directly to the
browser.

Related

JMeter - embedded requests has Bundle.js which in turn triggers more secondary requests - Recursive retrieving of embedded resource requests

My JMeter version is apache-jmeter-5.4.1.
I am trying to set up a HTTP Request something like this on a react based website:
HTTP Request - GET http://YYY.YYY.YYYY/141719 (With Retrieve Embedded Resources checked)
When I run this, I can see JMeter captures embedded resource requests (Secondary requests) which are like *.css, *.js
Second set of embedded resource requests :
However one of these secondary requests called - bundle.xxxxxxx.js creates another set of embedded resource requests to the server which retrieve further *.js files as part of the Request Initiator chain.
While the name of this file itself is randomly generated like, ex., bundle.0787f963ab0ac67dd7d4.js
The browser of course parses this bundle.xxxxxxx.js immediately and gets all the embedded resources/requests(including chunk.*.js)
My problem is how do I replicate this behaviour using JMeter, for the second set of embedded resource requests also to be triggered. At the moment, I can only achieve capturing the first set of embedded resource requests. This does not give me true load test results as the second set has more traffic to the server. Is there a way to recursively Retrieve all embedded resources.
Our application under test is based on React JS.
As per JMeter project main page:
JMeter is not a browser, it works at protocol level. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages. Nor does it render the HTML pages as a browser does (it's possible to view the response as HTML etc., but the timings are not included in any samples, and only one sample in one thread is ever displayed at a time).
There are several initiator types:
parser (index, styles, fonts, etc.) - these guys will be captured by embedded resources downloading functionality. All below will need to be handled somehow else
redirect
script
other
So if you need to mimic a number of HTTP requests which originate from JavaScript it will be required to replicate the logic from this JavaScript code in JMeter.
There is a minimal chance that you will be able to copy and paste the JavaScript into JSR223 PostProcessor however most likely it relies on certain objects like navigator, window or XMLHttpRequest so it's highly likely that you will have to re-write it in Groovy. Once you have enough data to properly build HTTP Request samplers most probably you will need to put them under the Parallel Controller
Another option is going for WebDriver Sampler and use real browsers for your tests but be aware that browsers are very resources intensive comparing to HTTP Request samplers

Is catch-all handler pointing to "auto" a bad idea?

My instance has little to no traffic but I have a min-idle instance set to 1. What I notice is that whenever there is a random url (via some bot) that doesn't exist is accessed, it is considered a dynamic request since my catch all handler is auto. This is fine, except I see these 404 errors (404 because there are no http handlers associated with these url patterns even though the yaml defines a catch all pattern) resulting in instance restarts. Why should the instance restart if it runs into 404 errors?
I have all my dynamic handlers follow "/api" pattern and then a few that don't. So, I can explicitly list all valid patterns and map them to the auto handler. Would that then consider these random links as static but not present and throw 404 error (which I am fine with)? I want to make sure the instance doesn't keep running just because of some rouge requests.
I just did a local experiment (I don't presently have any quickly deployable play app) and it looks like your quite interesting idea could work.
I replaced the .* pattern previously catching all stragglers and routing them to my default service script (I'm using the python runtime) with specific patterns, then added this handler after all others:
- url: /(.*)$
static_files: images/\1
upload: images/.*
My images directory is real, holding static images (but for which I already have another handler with a more specific pattern).
With this in place I made a request to /crap and got, as expected (there is no images/crap file):
INFO 2019-11-08 03:06:02,463 module.py:861] default: "GET /crap
HTTP/1.1" 404 -
I added logging calls in my script handler's get() and dispatch() calls to confirm they're not actually getting invoked (the development server request logging casts a bit of doubt).
I also checked on an already deployed GAE app that requesting an image that matches a static handler pattern but which doesn't actually exist gets the 404 answer without causing a service's instance to be started (no instance was running at the time), i.e. it comes directly from the GAE's static content CDN.
So I think it's well worth a try with the go runtime, this could save some significant instance time for an app without a lot of activity faced with random bot traffic.
As for the instance restarts, I suspect what you see is just a symptom of your min-idle instance set to 1. Unlike a dynamic instance the idle (aka resident) instance is not normally meant to handle traffic, it's just ready to do it if/when needed. Only when there is no dynamic instance running (and able to handle incoming traffic efficiently) and a new request comes in that request is immediately routed to the idle instance. At that moment:
the idle instance becomes a dynamic one and will continue to serve traffic until it shuts due to inactivity or dies
a fresh idle instance is started to meet the min-idle configuration, it will remain idle until another similar event occurs
Note: your idea will help with the instance hours portion used by the dynamic instances, but not with the idle instance portion.
According to the documentation which quotes the following:
"When an instance responds to the request /_ah/startwith an HTTP status code of 200–299 or 404, it is considered to have started correctly and that it can handle additional requests. Otherwise, App Engine cancels the instance. Instances with manual scale adjustment restart immediately, while instances with basic scale adjustment restart only when necessary to deliver traffic."
You can find more detail about how instances are managed for Standard App Engine environment for Go 1.12 on the link: https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed
As well, I recommend you to read the document "How instances are managed", on which quotes the following:
"Secondary routing
If a request matches the part [YOUR_PROJECT_ID].appspot.comof the host name, but includes the name of a service, version, or instance that does not exist, the service is routed default. Secondary routing does not apply to custom domains; requests sent to these domains will show an HTTP status code 404if the hostname is not valid."
https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed

Change to static file doesn't happen immediately after deploy

When I change a static file (here page.html), and then run appcfg.py update, even after deployment is successful and it says the new files are serving, if I curl for the file the change has not actually taken place.
Relevant excerpt from my app.yaml:
default_expiration: "10d"
- url: /
static_files: static/page.html
upload: static/page.html
secure: always
Google's docs say "Static cache expiration - Unless told otherwise, web proxies and browsers retain files they load from a website for a limited period of time." There shouldn't be any browser cache as I am using curl to get the file, and I don't have a proxy set up at home at least.
Possible hints at the answer
Interestingly, if I curl for /static/page.html directly, it has updated, but if I curl for / which should point to the same file, it has not.
Also if I add some dummy GET arg, such as /?foo, then I can also see the updated version. I also tried adding the -H "Cache-Control: no-cache" option to my curl command, but I still got the stale version.
How do I see updates to / immediately after deploy?
As pointed out by Omair, the docs for the standard environment for Pyhton state that "files are likely to be cached by the user's browser, as well as by intermediate caching proxy servers such as Internet Service Providers". But I've found a way to flush static files cached by your app on Google Cloud.
Head to your Google Cloud Console and open your project. Under the left hamburger menu, head to Storage -> Browser. There you should find at least one Bucket: your-project-name.appspot.com. Under the Lifecycle column, click on the link with respect to your-project-name.appspot.com. Delete any existing rules, since they may conflict with the one you will create now.
Create a new rule by clicking on the 'Add rule' button. For the object conditions, choose only the 'Newer version' option and set it to 1. Don't forget to click on the 'Continue' button. For the action, select 'Delete' and click on the 'Continue' button. Save your new rule.
This new rule will take up to 24 hours to take effect, but at least for my project it took only a few minutes. Once it is up and running, the version of the files being served by your app under your-project-name.appspot.com will always be the latest deployed, solving the problem. Also, if you are routinely editing your static files, you should remove any expiration element from handlers related to those static files and the default_expiration element from the app.yaml file, which will help avoid unintended caching by other servers.
According to App Engine's documentation on static cache expiration, this could be due to caching servers between you and your application respecting the caching headers on the responses:
The expiration time will be sent in the Cache-Control and Expires HTTP response headers, and therefore, the files are likely to be cached by the user's browser, as well as by intermediate caching proxy servers such as Internet Service Providers.
Once a file is transmitted with a given cache expiration time, there is generally no way to clear it out of intermediate caches, even if you clear the browser cache or use Curl command with no-cache option. Re-deploying a new version of the app will not reset caches as well.
For files that needs to be modified, shorter expire times are recommended.

How does Google App Engine modify HTTP request/response encoding?

I'm trying to send JSON HTTP request from Google App Engine application and retrieve response, and while this works great locally, it suddenly breaks when I deploy it to GAE.
To be more precise, the HTTP response body that is returned to my application ends up looking like this instead of being simple JSON:
�\bD�[��8��ʖϣ�M�M$ �\\�bA` #!r���~pvk�cR]�_7E�
I did find one set of circumstances where I get correct response on GAE which might give some insight at this behavior - if the response doesn't have content-type header it goes through fine, but as soon as there is content-encoding header set to "gzip" present I get the incorrect garbage above as a response body.
Unfortunately I don't have control over the service I'm calling. So the only choice I have is to fix this somehow on my side, but to fix it I'm trying to understand the difference between what Google does to response. Does anyone know?
I understand that Google does some things to HTTP traffic. Is it forcing gzip on my responses as well?
I've also tried playing with encodings, trying to read response as utf-8, and setting utf-8 as default encoding for my GAE application as recommended here, but with no effect. I've ruled out incorrect processing of response in my code or anything I'm using, at least I think so, otherwise I'd have the same problems locally. I'm trying to understand what exactly is happening with hope that this will give me some idea how to prevent it.
EDIT: I figured it out and did a workaround but it's still a workaround, not a solution. So my GAE app calls another web service from outside GAE which sometimes gzips response and sometimes doesn't. If it does, GAE strips away content-type header from response, thus preventing my app from correctly decoding response body. My workaround so far is to get response bytes and test if response is valid JSON, unzipping it manually if it isn't. Would still want to know if stripping content-type can be prevented...
As explained at https://cloud.google.com/appengine/kb/#compression , the application should not supply the content-encoding header: "Google App Engine does its best to serve gzipped content to browsers that support it. Taking advantage of this scheme is automatic and requires no modifications to applications".
I believe that the origin of this architecture (that content encoding is not controlled by the application side of things -- on App Engine, your application code -- but by the server/gateway side) originated with WSGI, the Python standard interface of applications to web servers/gateways (which App Engine uses on the Python runtime), but the architecture makes enough sense that we generalized it -- as the above page puts it, "This approach avoids some well-known bugs with gzipped content in popular browsers.".
The client is far from powerless, and in fact, if it so chooses, can control the content encoding -- still quoting, "To force gzipped content to be served, clients may supply 'gzip' as the value of both the Accept-Encoding and User-Agent request headers. Content will never be gzipped if no Accept-Encoding header is present.".

Caching & GZip on GAE (Community Wiki)

Why does it seem like Google App Engine isn’t setting appropriate cache-friendly headers (like far-future expiration dates) on my CSS stylesheets and JavaScript files? When does GAE gzip those files? My app.yaml marks the respective directories as static_dirs, so the lack of far-future expiration dates is kind of surprising to me.
This is a community wiki to showcase the best practices regarding static file caching and gzipping on GAE!
How does GAE handle caching?
It seems GAE sets near-future cache expiration times, but does use the etag header. This is used so browsers can ask, “Has this file changed since when it had a etag of X68f0o?” and hear “Nope – 304 Not Modified” back in response.
As opposed to far-future expiration dates, this has the following trade-offs:
Your end users will get the latest copies of your resources, even if they have the same name (unlike far-future expiration). This is good.
Your end users will however still have to make a request to check on the status of that file. This does slow down your site, and is “pure overhead” when the content hasn’t changed. This is not ideal.
Opting for far-future cache expiration instead of (just) etag
To use far-future expiration dates takes two steps and a bit of understanding.
You have to manually update your app to request new versions of resources, by e.g. naming files like mysitesstyles.2011-02-11T0411.css instead of mysitestyles.css. There are tools to help automate this, but I’m not aware of any that directly relate to GAE.
Configure GAE to set the expiration times you want by using default_expiration and/or expiration in app.yaml. GAE docs on static files
A third option: Application manifests
Cache manifests are an HTML5 feature that overrides cache headers. MDN article, DiveIntoHTML5, W3C. This affects more than just your script and style files' caching, however. Use with care!
When does GAE gzip?
According to Google’s FAQ,
Google App Engine does its best to serve gzipped content to browsers that support it. Taking advantage of this scheme is automatic and requires no modifications to applications.
We use a combination of request headers (Accept-Encoding, User-Agent) and response headers (Content-Type) to determine whether or not the end-user can take advantage of gzipped content. This approach avoids some well-known bugs with gzipped content in popular browsers. To force gzipped content to be served, clients may supply 'gzip' as the value of both the Accept-Encoding and User-Agent request headers. Content will never be gzipped if no Accept-Encoding header is present.
This is covered further in the runtime environment documentation (Java | Python).
Some real-world observations do show this to generally be true. Assuming a gzip-capable browser:
GAE gzips actual pages (if they have proper content-type headers like text/html; charset=utf-8)
GAE gzips scripts and styles in static_dirs (defined in app.yaml).
Note that you should not expect GAE to gzip images like GIFs or JPEGs as they are already compressed.

Resources