I've specified a cron job (to test in development) but it doesn't seem to be running. How does one make sure the jobs will work in production?
cron.yaml:
cron:
- description: cron test gathering
url: /test/cron
schedule: every 2 minutes from 09:00 to 23:00
app.yaml:
application: cron_test
version: 1
runtime: python
api_version: 1
handlers:
- url: /.*
script: main.py
main.py:
url_map = [ ('/test/cron', test.CronHandler),
('/error', err.Err404Handler)]
application = webapp.WSGIApplication(url_map, debug=False)
def main():
wsgiref.handlers.CGIHandler().run(application)
if __name__ == "__main__":
main()
FeedCron is defined as:
class CronHandler(webapp.RequestHandler):
def get(self):
logging.info("NOTE: CronHandler get request");
return None
I was expecting to see the line, "NOTE: CronHandler get request", in the app engine's logs. I'm using the GoogleAppEngineLauncher app (version: 1.5.3.1187) to start & stop the app.
D'Oh! Just saw the fine print in the SDK documentation:
When using the Python SDK, the dev_appserver has an admin interface
that allows you to view cron jobs at /_ah/admin/cron.
The development server doesn't automatically run your cron jobs. You
can use your local desktop's cron or scheduled tasks interface to
trigger the URLs of your jobs with curl or a similar tool.
Three years later things have improved.
First, the route to Cron Jobs is: http://localhost:8000/cron
The development server (still) doesn't automatically run your cron jobs. However, using the link above you can do two things:
Click the "Run now" button, which actually triggers the URL (hurray!)
See the schedule, which should assure you of when the jobs would be run in production
I was looking for a way to simulate cron jobs on the local dev server. As a temporary solution, I am running locally a python script which access the cron url and trigger the schedule task.
import urllib2
import time
while True:
print urllib2.urlopen("http://localhost:9080/cron/jobs/")
time.sleep(60)
In my case the url is http://localhost:9080/cron/jobs/ and I run it every minute.
Hope it can help.
Well, my UI & backend codebase were decoupled. So I whipped up some ajax code on the UI to regularly hit the backend cron endpoints.
That simulated the cron jobs for me in the local dev environment.
Related
I have been racking my brains on this for a few weeks now, trying different variations from Google Cloud service offerings but can't seem to find the proper one.
I have a python script with dependencies etc, that I have containerized, pushed, and deploy to GCR.
The script is a bot that connects to an external websocket receiving signals perpetually to then do other processing via API against another external service.
What would be the best service offering from Google Cloud to run this?
So far, I've tried:
GCR Web Service - requires listening service (:8080) which I do not provide in this use case, and, it scales down your service when there is no traffic so no go.
GCR Job Service - Seems like the next ideal option (no HTTP port requirement) - however, since the script (my entry point), upon launch, doesn't 'return' unless it quits, the service launch just allows it to run for a minute or so, until the jobs API declares it as 'failed' - basically, it is launching it via my entry point which just executes the script as if it was running locally and my script isn't meant to return anything.
To try and get around this, I went the google's recommended way and built a main.py with they standard boilerplate, and built it as a wrapper to act as a launcher for the actual script. I did this via a simple subprocess.Popen using their sample main.py as shown below.
main.py
import json
import os
import sys
import subprocess
# Retrieve Job-defined env vars
TASK_INDEX = os.getenv("CLOUD_RUN_TASK_INDEX", 0)
TASK_ATTEMPT = os.getenv("CLOUD_RUN_TASK_ATTEMPT", 0)
# Define main script
def main():
print(f"Starting Task #{TASK_INDEX}, Attempt #{TASK_ATTEMPT}...")
subprocess.Popen(["python3", "myscript.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f"Completed Task #{TASK_INDEX}.")
# Start script
if __name__ == "__main__":
try:
main()
except Exception as err:
message = f"Task #{TASK_INDEX}, " \
+ f"Attempt #{TASK_ATTEMPT} failed: {str(err)}"
print(json.dumps({"message": message, "severity": "ERROR"}))
sys.exit(1) # Retry Job Task by exiting the process
My thinking being, this would allow the job to execute my script and mark the job as completed, while the actual script remains running. Also, since subprocess.Popen sets its stdout and stderr to PIPE, my thinking is it would get caught by the google logging and I would see the output.
The job runs and marks it as succeed, however, I see no indication of the actual script executing anywhere.
I had similar issue with Google Cloud functions. Jobs seemed like an ideal option since I can run on their scheduler to make sure it is launching after saying, every hour (my script uses a lock file so it doesn't run again if running).
Am I just missing the point on how these cloud services run?
Do offerings like google cloud run jobs/functions, etc meant to execute jobs that return and quit until launched again by however scheduled?
Do I need to consider Google Computing engine as an option for this use case that is, a full running VM instead of stateless/serverless options?
I am trying to use this in a containerized, scalable as needed, fashion to both make my project portable and minimize costs as much as possible given the always running nature of the job.
Lastly, I know services like pythonanywhere as I am sure others, make this kinda stuff easier, but I would like to learn how to do this via standard cloud offerings like GCR, AWS, etc.
thanks for any insight / advice!
Cloud Run best fit is for HTTP Rest APIs serving (stateless services). There are also Jobs in beta.
One of the top feature of Run is that it scales to 0, when there are not requests to your service (your service instance gets totally destroyed).
If your bot needs to stay alive "for ever", Run is not for you... (Even if you can configure Run to keep at least one instance live).
I would consider instead AppEngine or Compute.
I recently updated my cron.yaml file and now my cron tasks fail with no entries in the logs.
It is acting like the java servlet at the url is not being run.
I can paste the url into a browser and the servlet runs fine.
My cron.yaml file:
cron:
- description: Daily revenues report
url: /revenues
schedule: every day 07:35
timezone: America/Denver
Using below deploycron.sh
PROJECT_ID='my-project-id'
gcloud config set project ${PROJECT_ID}
gcloud info
gcloud app deploy cron.yaml
Is there an error in my .yaml?
Is there a special task queue set up required?
Is some other configuration or permissions piece missing?
It was running fine last week. I have tried deleting and starting over to no avail.
https://console.cloud.google.com/cloudscheduler?project=project-id
Shows the job. Result column 'Failed'.
Logs 'View' link shows:
protoPayload.taskName="01661931846119241031" protoPayload.taskQueueName="__cron"
with no log entries.
Is __cron not automatic?
I am at a loss.
App Engine Standard. Java 8.
After installing the latest update of GCloud locally and re-running the deploy cron script. The cron jobs now run as before. 02/02/2021.
'Failed' means that the endpoint /revenues is not returning a success http status code.
Logs 'View' link shows: protoPayload.taskName="01661931846119241031" protoPayload.taskQueueName="__cron" with no log entries
Maybe don't use the premade filter, and just try filtering for /revenues or viewing all the logs at 07:35 am (when it was supposed to have run)
Is there an error in my .yaml?
if there was then gcloud app deploy cron.yaml would fail
Is there a special task queue set up required?
you shouldn't need to do anything, i didn't
I can paste the url into a browser and the servlet runs fine.
When you paste the url into the browser, is there any redirecting (like from /revenues to /revenues/) or anything that your browser is handling for you. Maybe /revenues is expecting there to be cookies present now.
What are there any special app.yaml or dispatch.yaml rules that /revenues would be hitting?
Is /revenues being handled by a service other than the default service?
I had a similar problem: CRON tasks fail without any logs.
The root cause was that the IP address of App Engine was blocked by the App Engine Firewall. Thus I had to update the allow-list, as described here: https://cloud.google.com/appengine/docs/standard/nodejs/scheduling-jobs-with-cron-yaml#validating_cron_requests
I started having the same problem a few days ago on my existing CRON schedules. I've tried everything including tearing my code down to the bare minimum and creating a new GAE project with the Hello World quick start. It still fails. Nothing useful in the logs and the UI just says 'Failed'. I'm pulling my hair out.
Sorry I don't have an answer to contribute but your post makes me think it's on Google's side. I know they're moving CRON jobs to Cloud Scheduler->App Engine Cron Jobs. My gut tells me it's a permissions issue related to this move and IAM. I'm really at a loss.
Is there a way to schedule a cron job using the cron.yaml to trigger a HTTP cloud function. I tried to implement it but passing the entire URL is throwing an error.
cron:
- description: "Test Call"
url: https://us-central1-***.cloudfunctions.net/helloGET
schedule: every 1 mins
I see this error in the console when I try to deploy the cron job
Unable to assign value 'https://us-central1-***.cloudfunctions.net/helloGET' to attribute 'url':
Value 'https://us-central1-***.cloudfunctions.net/helloGET' for url does not match expression '^(?:^/.*$)$'
in "/Users/xyz/Desktop/cron.yaml", line 3, column 8
I know that error is being thrown because I have the full URL path but instead of the full path if I just pass the following
cron:
- description: "Test Call"
url: /helloGET
schedule: every 1 mins
then it is able to deploy the cron job but when the job is run it throws a 404 error because by just passing the path and not the full URL I believe it is looking for the URL in the app engine and since I dont have any code in the app engine and my service call is in the cloud function it is not able to find it.
Also is there a way to set the schedule to be run every 1 seconds instead of 1 mins.
The url in the cron.yaml needs to be a URl handled by your app, not an arbitrary one - which is why only the relative path works. From Syntax (emphasis mine):
url
Required. The url field specifies a URL in your application that
will be invoked by the Cron Service.
What you can do is have your application cron handler reach out to the arbitrary URL you need to trigger your Cloud Function. See Issuing HTTP(S) Requests
As for going below 1 minute intervals - that's not supported by cron itself. But there are ways to achieve something almost equivalent, see, for example High frequency data refresh with Google App Engine
I am using google app engine and have 2 applications that use cron jobs to schedule events. I am able to deploy both applications by using gcloud app deploy app.yaml cron.yaml. Even though both apps are deployed and working, only one of the cron jobs actually runs. This is what the files look like.
First cron.yaml
cron:
- description: "GET first group"
url: /
schedule: every 5 minutes
target: pubsubone
Second cron.yaml
cron:
- description: "GET second group"
url: /
schedule: every 5 minutes
target: pubsubtwo
These files are in different folders and associated with different applications.
When you deploy the second service with a new cron.yaml file, the first cron job gets overwritten since only one cron.yaml is expected for deployments. To deploy both cron jobs contemporarily join them in a single file, as shown in the example here and then deploy the resulting cron.yaml file as shown here. The cron.yaml should look like this:
cron:
- description: "GET first group"
url: /
schedule: every 5 minutes
target: pubsubone
- description: "GET second group"
url: /
schedule: every 5 minutes
target: pubsubtwo
And the command line to deploy it is this one:
$ gcloud app deploy cron.yaml
There are several reasons this can be failing for you, and sorting out which will be easier if you use something other than / as the url. /cron, perhaps. That'll make it a lot easier to determine, when looking at the logs, that the url is being called as intended.
Next, there the target. If there aren't versions of your app active that have the specified value as their version (or service, though I don't have experience with that), AFAIK, the request generated by cron will get dropped on the floor.
In the 1.8.4 release of Google App Engine it states:
A Datastore Admin fix in this release improves security by ensuring that scheduled backups can now only be started by a cron or task queue task. Administrators can still start a backup by going to the Datastore Admin in the Admin Console.
The way to run scheduled backups with cron is documented, but how can we initiate backups from a task queue task?
Are there any other ways to programmatically run backup tasks?
You can create a task queue task with method GET and URL "/_ah/datastore_admin/backup.create" with your parameters specified as arguments to the URL and target the task to run on the 'ah-builtin-python-bundle' version. Example:
url = '/_ah/datastore_admin/backup.create?filesystem=blobstore&name=MyBackup&kind=Kind1&kind=Kind2'
taskqueue.add(
url=url,
target='ah-builtin-python-bundle',
method='GET',
)
I have cron jobs that trigger my own handlers that then look up a config and create a task queue backup based on that config. This lets me change backup settings without having to update cron jobs and lets me have a lot more control.
The options you can specify in the URL are the same as the documentation describes for CRON job backups so you can specify namespace and gs-bucket-name as well.
I believe to do this in Java you have to create a queue with a target in the queue definition and add your tasks to that queue.
I did this by combining Bryce' solution with the code from googles scheduled backup documentation. This way, I'm still using cron.yaml but I have the flexibility for differences in each environment (e.g. don't run a backup in the dev/stage branch based on config in the datastore, don't specify types in the URL that haven't made it out of dev yet).
I also was able to generate the &kind=xxx pairs using this:
from google.appengine.ext.ndb import metadata
backup_types = "".join(["&kind=" + kind for kind in metadata.get_kinds() if not kind.startswith("_")])
The steps were pretty simple in retrospect.
Setup:
a) Enable your default cloud storage bucket
b) Enable datastore admin
Steps:
Add a cron job to kick off the backup (cron.yaml):
cron:
- description: daily backup
url: /taskqueue-ds-backup/
schedule: every 24 hours from 21:00 to 21:59
Add a queue to process the tasks (queue.yaml):
- name: ds-backup-queue
rate: 1/s
retry_parameters:
task_retry_limit: 1
Add a route to the task queue handler:
routes = [...,
RedirectRoute('/taskqueue-ds-backup/',
tasks.BackupDataTaskHandler,
name='ds-backup-queue', strict_slash=True), ...]
Add the handler to process the enqueued items:
from google.appengine.api import app_identity
from google.appengine.api import taskqueue
from google.appengine.ext.ndb import metadata
import config
class BackupDataTaskHandler(webapp2.RequestHandler):
def get(self):
enable_ds_backup = bool(config.get_config_setting("enable_datastore_backup", default_value="False"))
if not enable_ds_backup:
logging.debug("skipping backup. backup is not enabled in config")
return
backup_types = "".join(["&kind=" + kind for kind in metadata.get_kinds() if not kind.startswith("_")])
file_name_prefix = app_identity.get_application_id().replace(" ", "_") + "_"
bucket_name = app_identity.get_default_gcs_bucket_name()
backup_url = "/_ah/datastore_admin/backup.create?name={0}&filesystem=gs&gs_bucket_name={1}{2}".format(file_name_prefix, bucket_name, backup_types)
logging.debug("backup_url: " + backup_url)
# run the backup as a service in production. note this is only possible if you're signed up for the managed backups beta.
if app_identity.get_application_id() == "production-app-id":
backup_url += "&run_as_a_service=T"
taskqueue.add(
url=backup_url,
target='ah-builtin-python-bundle',
method='GET',
)
logging.debug("BackupDataTaskHandler completed.")