cron job in google app engine not working - google-app-engine

I have taken the basic python 3 tutorial website on flask from this google cloud tutorial and I am able to set this up and the website works just fine.
In addition , I also wanted to run a python script which runs everyday to collect some data, but the cron job just doesn't work. I also added login: admin to restrict anyone to directly use that url
cron.yaml
cron:
- description: test dispatch vs target
url: /cronapp
schedule: every 5 hours
app.yaml
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT main:app
runtime_config:
python_version: 3
handlers:
- url: /cronapp
script: cronapp.py
login: admin
I am calling this as http://myproject.appspot.com/cronapp also doesn't work and returns a 404.
what am I doing wrong ? any help is appreciated

Your app.yaml file is mixing up the standard environment Handlers element into a flexible environment configuration, so it is probably ignored. You can probably see the cron requests in the app's logs in the developer console (likely with errors, though).
You need to add a handler for /cronapp inside your app code, not in app.yaml. Not entirely sure how you do that (I'm still using only standard environment), it depends on your app and/or its framework. Take a look at the Hello World code review for a flask example.
Update:
I may not be entirely correct, I based my answer on documentation but I just noticed some inconsistencies (I sent some documentation feedback to Google for it).
The flexible environment Securing URLs for cron (which appears mostly copied from the standard environment equivalent) mentions a couple of solutions:
one indeed based on the login: admin option to handler:
You can restrict a URL by adding login: admin to the handler
configuration section in app.yaml. For more information see
Securing URLs
But handler is not mentioned in the Configuring your App with app.yaml and the Securing URLs is pointing to an inexistent tag. So I'm not sure if this is indeed working or not.
the second one is based on the X-Appengine-Cron header (same as in the standard environment):
Requests from the Cron Service will also contain a HTTP header:
X-Appengine-Cron: true
The X-Appengine-Cron header is set internally by Google App Engine.
If your request handler finds this header it can trust that the
request is a cron request. If the header is present in an external
user request to your app, it is stripped, except for requests from
logged in administrators of the application, who are allowed to set
the header for testing purposes.
But in Removed headers it is mentioned that:
In addition, some selected headers that match the following pattern
are removed from the request:
X-Appengine-*
It's unclear if this extends to X-Appengine-Cron or not. It's worth a try. This is my check in the (standard env, webapp2-based) cron handler code:
if self.request.headers.get('X-AppEngine-Cron') is None:
self.abort(403) # HTTPForbidden

Related

Google app engine - how to disable the cache

So some context:
I have a nodeJS api running on a google app engine. All my get requests are being cached by default by the app engine for 10 minutes.
I am using cloudflare for my API as this allows me to remove specific items from the cache when needed.
You can imagine this has caused a bit of an issue because my CF cache was correctly cleared but the app engine kept returning old data.
According to the docs, you can set a default_expiration in the app.yaml file but setting this to 0 or 0s has made no difference and google keeps caching my responses.
Seemingly, there is also no way you can get something uncached from google.
Now my obvious question here is: is there some way I can completely ignore this cache? Preferrably without having to set my entire API's response to private , 0s cache.
It quite irks me that google is forcing this cache on me provides very vague documentation on the whole matter.
You can configure your app.yaml to define a cache period.
If you use default_expiration this will set a global default cache period for all static file handlers for an application. If omitted, the production server sets the expiration to 10 minutes by default.
To set specific expiration times for individual handlers, specify the expiration element within the handler element in your app.yaml file. You can change the duration of time web proxies and browsers should cache a static file served by this handler.
default_expiration: "4d 5h"
handlers:
- url: /stylesheets
static_dir: stylesheets
expiration: "0d 0h"
Seems like you are referring to the static cache (per your link). Try cache bursting techniques such as adding a query parameter to the url e.g.
https://<url>/?{{APP_VERSION_ID}}
where APP_VERSION_ID is the latest version of your deployed App. This way, each time you redeploy your App, the APP_VERSION_ID is changed and the latest version of your static files will always be loaded

App Engine: Static Files Not Updating on Deploy

I pushed an HTML static file containing an Angular SPA as catch-all handler for my custom domain with this settings:
- url: /(api|activate|associate|c|close_fb|combine|import|password|sitemap)($|/.*)
script: gae.php
- url: /.*
static_files: public/static/app/v248/es/app.html
upload: public/static/app/v248/es/app.html
expiration: "1h"
That worked fine, but if I push a new app.html it doesn't update. I've tried to change the local path, deploy a new app version, even replacing the catch-all handler with a custom php endpoint, but it doesn't work, the response still is the first version of app.html I uploaded.
Other people has had the same problem (CSS File Not Updating on Deploy (Google AppEngine)), and it looks like is related to Google CDN cache but, as far as I know, there isn't any way to flush it.
There is a way to flush static files cached by your app on Google Cloud.
Head to your Google Cloud Console and open your project. Under the left hamburger menu, head to Storage -> Cloud Storage -> Browser. There you should find at least one Bucket: your-project-name.appspot.com. Under the Lifecycle column, click on the link with respect to your-project-name.appspot.com. Delete any existing rules, since they may conflict with the one you will create now.
Create a new rule by clicking on the 'Add A Rule' button. For the action, select "Set storage to nearline". For the object conditions, choose only the 'Number of newer versions' option and set it to 1. Click on the 'Continue' button and then click 'Create'.
This new rule will take up to 24 hours to take effect, but at least for my project it took only a few minutes. Once it is up and running, the version of the files being served by your app under your-project-name.appspot.com will always be the latest deployed, solving the problem. Also, if you are routinely editing your static files, you should remove any expiration element from handlers related to those static files and the default_expiration element from the app.yaml file, which will help avoid unintended caching by other servers.
When performing changes in static files in an App Engine application, changes will not be available immediately, due to cache, as you already imagined. The cache in Google Cloud cannot be manually flushed, so instead I would recommend you to change the expiration time to a shorter period (by default it is 10 minutes) if you want to test how it works, and later setting an appropriate expiration time according to your requirements.
Bear in mind that you can change the static cache expiration time both for all static files or for just the ones you choose, just by setting the proper element in the app.yaml file.
2020 Update:
For my application I found that App Engine started failing to detect my latest app deployments once I reached 50 Versions in my Versions list.
See (Burger Menu) -> App Engine -> Versions
After deleting a bunch of old versions on next deploy it picked up my latest changes immediately. Not sure if this is specific to my account or billing settings but that solved it for me.
I had my static files over a service in Google Cloud Platform. My problem was that I didn't execute
gcloud app deploy dispatch.yaml
Once executed, everything was fine. I hope it helps
Another problem that could be causing this is caching in Google's frontend, which depends on the cache header returned by your application. In my case, I opened Firefox's inspector on the Network tab, and saw that the stale file had a cache-control setting of 43200 seconds, i.e. 12 hours:
This was for a file returned from Flask, so I fixed this by explicitly specifying max-age in the Flask response from my GAE code:
return flask.send_from_directory(directory, filename, max_age=600)
This causes intermediate caches such as Google's frontend to only cache the file for a period of 600 seconds (10 minutes).
Unfortunately, once a file has been caches there is no way to flush it, so you will have to wait out the 12 hours. But it will solve the problem for the next time.

App-engine returning 304 even after html page is modified

I have an angular js app whose main container page "index.html" is updated each version. This is hosted in app-engine that I've built using Go. The server end offers straight serving of the html views and the static content as well as a restful API.
It all works great until I push a new version then I have to hard reload the page to avoid getting a 304.
My app.yaml file is really basic right now:
handlers:
- url: /.*
script: _go_app
I'm not setting any caching policies yet so I understand app engine will default the caching of static files for 10 minutes.
What's happening?
I believe I have found the answer.
Two things were bothering me about this:
Firstly, I knew I wasn't handling static files as efficiently as I could have been, it was a job I hadn't gotten around to yet but the static files were being delivered via routing code in the main go script rather than declared in the app.yaml
Using the latter method would let app engine do the work rather than my script and hopefully save a few precious processor cycles.
Secondly I wasn't exactly sure how index.html was being served when the path was "/", it was just working. I know that various web servers (Apache, IIS, etc) have a default page and app engine seemed to be working the same way.
So when I was deciding that each request didn't require any dynamic script I would simply serve it with the following code:
http.ServeFile(w, r, r.URL.Path[1:])
This magically turned "/" into "/index.html" for me but appears to have the caching bug described above. I will take this up with the Go community.
As for the fix, adding the page as a static file to the app.yaml made this work:
- url: /
static_files: index.html
upload: index.html
I will make sure I add all the other static folders too.

Custom domain http redirects to https when I don't want it to, why is it doing this?

I am trying to get a custom domain to work with Google App Engine 1.9.7 without SSL
I have done all the prerequisites;
Domain is verified with the proper TXT records.
Domain is configured in the GAE Cloud Console with the proper subdomain www.
Application is deployed the appspot.com domain and works.
But when I try to got to http://www.customdomain.com it immediately redirects to https://www.customdomain.com and I get the following error:
net::ERR_SSL_PROTOCOL_ERROR
I know that for SSL I need to set up a certificate.
I don't have any of my webapp modules configured to be secure.
I don't want SSL right now, I don't need it right now.
I found this little nugget after reading the instructions again and again:
It's okay for multiple domains and subdomains to point to the same
application. You can design your app to treat them all the same or
handle each one in a different way.
This is exactly what I want to do but I can't find any information on how to actually do this?
How do I get it to stop redirecting to the http to https?
I ran into the same problem. You say "I don't have any of my webapp modules configured to be secure." If that's the case, sorry, can't help you.
Otherwise the most likely cause for your problem would be: A "secure: always" flag for the respective handler in your app.yaml in the handlers section. Like so:
handlers:
- url: /*
secure: always
Remove the line with the "secure: always". Details in the official Google docs here (table item "secure").
How to run into this problem? I ran into it, because I copied the app.yaml from one of my other apps that didn't need to run on a custom domain, yet needed the SSL always.
For a Django/Python GAE app, by the way, the same problem is caused like this:
handlers:
- url: /.*
script: google.appengine.ext.django.main.app
secure: always
Same answer here: Remove or change the "secure" line. Python version just tested as described. Always works on the appspot.com domain, only without secure flag on a custom domain.
Just pointing out the above, as other people might run into this problem and come to this threat for help.
What I had to do was to shut down all instances and remove all versions, then do a fresh deployment from scratch, then I stopped having this problem.

Google App Engine Cron Requests Using POST

Is it possible to make a cron request to a URL via Google App Engine using method=post. I could not find anything in the documentation allowing different methods other than get.
https://developers.google.com/appengine/docs/python/config/cron#Python_app_yaml_Cron_support_in_the_development_server
the simple answer is no. from the docs it is clearly stated that cron jobs use HTTP GET. the best thing is to change your method to GET and restrict direct access to the url in your app.yaml.
like this:
handlers:
- url: /report/weekly
script: reports.app
login: admin
It's not possible. The requests will have a header 'X-AppEngine-Cron' that you can check for, that might help if you want to prevent accidental running from a browser.

Resources