app.yaml values are not reflected in Google Cloud Console - google-app-engine

I am using app.yaml file to configure my app engine flexible. The file looks like the following,
runtime: java
env: flex
service: hello-service
health_check:
enable_health_check: True
check_interval_sec: 10
timeout_sec: 4
unhealthy_threshold: 2
healthy_threshold: 2
automatic_scaling:
min_num_instances: 3
max_num_instances: 10
cool_down_period_sec: 120 # default value
cpu_utilization:
target_utilization: 0.5
However, when I click the "view" link for the version list in the cloud console, I only can see the following in the popup,
runtime: java
env: flexible
threadsafe: true
automatic_scaling:
min_num_instances: 3
max_num_instances: 10
health_check:
enable_health_check: true
check_interval_sec: 10
timeout_sec: 4
unhealthy_threshold: 2
healthy_threshold: 2
As you can see, it is missing few "automatic_scaling" properties. I am not sure why. Do I need to stop and start the relevant version to see the changes?

It is most likely that the config values matching the default values are not displayed.
From the documentation the default values for the missing parameters:
cool_down_period_sec
The number of seconds that the autoscaler should wait before it
starts collecting information from a new instance. This prevents the
autoscaler from collecting information when the instance is
initializing, during which the collected usage would not be reliable.
The cool-down period must be greater than or equal to 60 seconds.
The default is 120 seconds.
target_utilization
Target CPU utilization (default 0.5). CPU use is averaged across
all running instances and is used to decide when to reduce or increase
the number of instances.
The cpu_utilization is likely not displayed because target_utilization (the only item under it) dissapeared.
It should be easy to check - just change the values for the missing configs slightly, re-deploy and see if the updated values are remebered or not.

Related

Serve static assets with an efficient cache policy - Nuxt.js + GAE

I get the following from Lighthouse:
How do I change the Cache TTL on a Nuxt.js SSR website? I found some answers but nothing about Nuxt.js...
IMPORTANT: Deployed in Google App Engine
The specific answer for GAE Apps, is the parameter handlers.expiration in app.yaml file:
handlers:
- url: /_nuxt
static_dir: .nuxt/dist/client
expiration: 4d 5h
secure: always
Or if you want to configure it globally, set the default_expiration parameter at the root level:
default_expiration: 4d 5h
Allows d (days), h (hours), m (minutes) and s (seconds). Here's the docs
You can serve your static folder with custom cache policy following the render configuration.
As an example:
render: {
// Setting up cache for 'static' directory - a year in milliseconds
// https://web.dev/uses-long-cache-ttl
static: {
maxAge: 60 * 60 * 24 * 365 * 1000,
},
},
In addition to the answer, please note that - at the time of writing - the minimum time required by Lighthouse to pass is > 96.5d (Source : https://github.com/GoogleChrome/lighthouse/issues/11380)
I've followed the answer by #lmfresneda and managed to get the solution to work, making the cache time 30d : I still had the Lighthouse test fail until I changed it to "97d".

AppEngine Flexible instances constantly respawning

I am deploying a Go application using AppEngine flexible. Below is my app.yaml. Sometimes after I deploy it stabilizes at 1 instance (it's a very low load application), but most of the time it constantly respawns upwards of 6 instances. My logs are filled with messages showing the new instances being created. There is nearly zero load on this application, why is AppEngine flexible constantly destroying and respawning instances?
Log showing constant respawning:
app.yaml
runtime: go
api_version: go1
env: flex
handlers:
- url: /.*
script: _go_app
health_check:
enable_health_check: True
check_interval_sec: 10
timeout_sec: 4
unhealthy_threshold: 2
healthy_threshold: 2
automatic_scaling:
min_num_instances: 1
max_num_instances: 10
cool_down_period_sec: 120 # default value
cpu_utilization:
target_utilization: 0.5
The problem was with my health check function. It originally looked like this:
func healthCheckHandler(w http.ResponseWriter, r *http.Request) {
return
}
I then discovered this sentence in the documentation on how instances are managed:
You can write your own custom health-checking code. It should reply to /_ah/health requests with a HTTP status code 200. The response must include a message body, however, the value of the body is ignored (it can be empty).
So I changed the health check function to write a simple "ok" in response:
func healthCheckHandler(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("ok"))
return
}
The instances now behave according to my autoscale settings! The respawning is gone.
I obviously should have read the documentation closer, but there was zero indication of a problem in the health check logs. All health checks looked like they were passing. Hopefully this info is helpful to others.

WebPageTest complaining about not caching static resources even though I have caching enabled

I am testing my website on webpagetest.org. It gives me a
and then goes on to give this list:
Leverage browser caching of static assets: 63/100
WARNING - (2.0 hours) - http://stats.g.doubleclick.net/dc.js
WARNING - (5.5 days) - http://www.bookmine.net/css/images/ui-bg_highlight-soft_100_eeeeee_1x100.png
WARNING - (5.5 days) - http://www.bookmine.net/favicon.ico
WARNING - (5.5 days) - http://www.bookmine.net/js/index.min.js
WARNING - (5.5 days) - http://www.bookmine.net/js/jquery-ui-1.8.13.custom.min.js
WARNING - (5.5 days) - http://www.bookmine.net/css/index.css
WARNING - (5.5 days) - http://www.bookmine.net/js/jquery.form.min.js
WARNING - (5.5 days) - http://www.bookmine.net/css/jquery-ui-1.8.13.custom.css
funny thing is that it does recognize I have caching enabled (set to 5.5 days as reported above), then what is it complaining about? I have also verified I have a default_expiration: "5d 12h" set in my app.yaml and from this link:
default_expiration
Optional. The length of time a static file served by a static file
handler ought to be cached by web proxies and browsers, if the handler
does not specify its own expiration. The value is a string of numbers
and units, separated by spaces, where units can be d for days, h
for hours, m for minutes, and s for seconds. For example, "4d 5h"
sets cache expiration to 4 days and 5 hours after the file is first
requested. If omitted, the production server sets the expiration to 10
minutes.
For example:
application: myapp version: alpha-001 runtime: python27 api_version: 1
threadsafe: true
default_expiration: "4d 5h"
handlers:
Important: The expiration time will be sent in the Cache-Control and Expires HTTP response headers, and therefore, the files are likely
to be cached by the user's browser, as well as intermediate caching
proxy servers such as Internet Service Providers. Once a file is
transmitted with a given expiration time, there is generally no way to
clear it out of intermediate caches, even if the user clears their own
browser cache. Re-deploying a new version of the app will not reset
any caches. Therefore, if you ever plan to modify a static file, it
should have a short (less than one hour) expiration time. In most
cases, the default 10-minute expiration time is appropriate.
I even verified response my website is returning in fiddler:
HTTP/200 responses are cacheable by default, unless Expires, Pragma,
or Cache-Control headers are present and forbid caching. HTTP/1.0
Expires Header is present: Sat, 26 Sep 2015 08:14:56 GMT
HTTP/1.1 Cache-Control Header is present: public, max-age=475200
public: This response MAY be cached by any cache. max-age: This
resource will expire in 132 hours. [475200 sec]
HTTP/1.1 ETAG Header is present: "74YGeg"
So why am I getting a D?
Adding some useful links:
- http://www.learningtechnicalstuff.com/2011/01/static-resources-and-cache-busting-on.html
- http://www.codeproject.com/Articles/203288/Automatic-JS-CSS-versioning-to-update-browser-cach
- https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching#invalidating-and-updating-cached-responses
- https://developers.google.com/speed/docs/insights/LeverageBrowserCaching
- https://stackoverflow.com/a/7671705/147530
- http://www.particletree.com/notebook/automatically-version-your-css-and-javascript-files/
WebPagetest gives a warning if the cache expiration is set for less than 30 days. You can view that detail by clicking on the "D" grade in your test results and viewing the glossary for "Cache Static". You can also find that info here.
If you need to modify a cached static javascript file, you can add version number to the file path or in a querystring.

Set default settings to 'no-cache' on Google Cloud Storage

Is there a way to set all public links to have 'no-cache' in Google Cloud Storage?
I've seen solutions to use gsutil to set the "Cache-Control" upon file-upload, but I'm looking for a more permanent solution.
There was a conversation about providing a cache invalidation feature but I didn't quite follow the reasoning. Any explanations would be greatly appreciated!
it would be difficult to provide a cache invalidation feature because once served with a non-0 cache TTL any cache on the Internet (not just those under Google's control) is allowed (per HTTP spec) to cache the data
Thanks!
For a more permanent one-time-effort solution, with the current offerings on GCP, you can do this with Cloud Functions.
Create a new Funciton, set the Event type to "On (finalizing/creating) file in the selected bucket" - google.storage.object.finalize. Make sure to select the bucket you want this on. In the body of the function, set the cacheControl / Cache-Control attribute for the blob. The attribute name depends on the language. Here's my version in Python, using cache_control:
main.py:
match the function name below to the Entry point
from google.cloud import storage
def set_file_uncached(event, context):
file = event # auto-generated
print(f"Processing file: {file=}") # logging, if you want it
storage_client = storage.Client()
# we expect just one with that name
blob = storage_client.bucket(file["bucket"]).get_blob(file["name"])
if not blob:
# in case the blob is deleted before this executes
print(f"blob not found")
return None
blob.cache_control = "public, max-age=0" # or whatever you need
blob.patch()
requirements.txt
google-cloud-storage
From the logs: Function execution took 1712 ms, finished with status: 'ok'. This could have been faster but I've set the minimum to 0 instances so it needs to spin-up for each upload. Depending on your usage and cost constraints, you can set it to 1 or something higher.
Other settings:
Retry on failure: No/False
Region: [wherever your bucket is]
Memory allocated: 128 MB (smallest available currently)
Timeout: 5 seconds (smallest available currently, function shouldn't take longer)
Minimum instances: 0
Maximum instances: 1

Nutch 1.4 and Solr 3.6 - Nutch not crawling 301/302 redirects

I am having an issue where the initial page is crawled by the redirect is not being crawled or indexed.
I have the http.redirect.max property set to 5, I have attempted values 0, 1, and 3.
<property>
<name>http.redirect.max</name>
<value>5</value>
<description>The maximum number of redirects the fetcher will follow when
trying to fetch a page. If set to negative or 0, fetcher won't immediately
follow redirected URLs, instead it will record them for later fetching.
</description>
</property>
I have also attempted to clear out a majority of what is in the regex-urlfilter.txt and crawl-urlfilter.txt. Other than the website being crawled this is the only other params in these files.
# skip file: ftp: and mailto: urls
-^(file|ftp|mailto):
# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|PDF|pdf|js|JS|swf|SWF|ashx|css|CSS|wmv|WMV)$
Also it seems like Nutch is crawling and pushing only pages that have querystring parameters.
When looking at the output.
http://example.com/build Version: 7
Status: 4 (db_redir_temp)
Fetch time: Fri Sep 12 00:32:33 EDT 2014
Modified time: Wed Dec 31 19:00:00 EST 1969
Retries since fetch: 0
Retry interval: 2700 seconds (0 days)
Score: 0.04620983
Signature: null
Metadata: _pst_: temp_moved(13), lastModified=0: http://example.com/build/
There is a default IIS redirect occuring throwing a 302 to add the trailing slash. I have made sure this slash is already added on all pages. So unsure why this is being redirected.
Just a bit more information, here are some parameters I have tried.
depth=5 (tried 1-10)
threads=30 (tried 1 - 30)
adddays=7 (tried 0, 7)
topN=500 (tried 500, 1000)
Try running Wireshark on the webserver to see exactly what is being served, and on the machine Nutch is on to see what's being requested. If they're on the same server, great. Try that and add HTTP to your filter box after the capture.

Resources