I am writing a wrapper around the Github Issues API to allow managers in my company to set up daily reminder emails to be sent to their devs. I want this to be configurable through an admin console, and give them the flexibility of setting up reminders at any time of day and any number of times a day.
The main App Engine cron system is configured statically through the cron.yaml file and cannot be changed by user action. Looking at the documentation it appears like I can only do this by reimplementing an entire cron infrastructure on top of the basic App Engine cron. Am I missing something? Is there anything like this that is already available elsewhere?
You are correct, you cannot setup programmatically the cron configuration.
You can configure a single cron which triggers a customized functions. This functions can read the configured crons (like datastore entities) and the launch different task based on your needs
Related
I'm wondering how to set up logging for Google App Engine cron jobs. I haven't found any information about this specific topic in the App Engine documentation.
There's a page https://console.cloud.google.com/appengine/cronjobs in GCP. Every cron job has a "View" link in the "Log" column, which leads a user to the Logs Viewer with the following filters:
protoPayload.taskName="..."
protoPayload.taskQueueName="__cron"
In my case, no logs for cron jobs are displayed.
The service that serves the endpoints for the cron jobs is a node.js application that uses Winston logging with the transport provided by #google-cloud/logging-winston package. This application is responsible not only for processing cron jobs, and the logging works there fine: for instance, I'm able to filter specific queries by Google's trace id.
Is there anything I can provide with the logs payload to be able to filter them by taskName and taskQueueName? And where would I take these values, i.e. are there any request headers I could read them from and write with logs?
It would be great if it's something achievable with #google-cloud/logging-winston. If not, a library/language agnostic answer would also be helpful.
I have backuped datastore via cron using cron.yaml like following
- description: My Daily Backup
url: /_ah/datastore_admin/backup.create?name=BackupToCloud&kind=LogTitle&kind=EventLog&filesystem=gs&gs_bucket_name=whitsend
schedule: every 12 hours
target: ah-builtin-python-bundle
But
According to google announcement, datastore-admin will go to "deprecated".
https://cloud.google.com/datastore/docs/console/datastore-backing-up-restoring
How to backup datastore via cron without datastore_admin?
https://cloud.google.com/appengine/articles/scheduled_backups
says only about using gcloud.
Note that just the backup/restore functionality based on the datastore-admin will be deprecated, not the datastore-admin itself.
The deprecation note points to the Managed export and import service as the recommended replacement alternative.
Exports based on this method can also be scheduled, see Scheduling an Export. You'll note in that article that a standard env GAE app with a cron service is exactly what the method is based on.
The article is targeted at those apps using the Datastore outside of GAE. Since you already have a GAE app you can just modify your existing backup cron job handler following the example in the article or, if you want to separate it a bit from your main app, you can add a separate service to your app, dedicated to the backup cron job.
In my app (Google App Engine Standard Python 2.7) I have some flags in global variables that are initialized (read values from memcache/Datastore) when the instance start (at the first request). That variables values doesn't change often, only once a month or in case of emergencies (i.e. when google app engine Taskqueue or Memcache service are not working well, that happened not more than twice a year as reported in GC Status but affected seriously my app and my customers: https://status.cloud.google.com/incident/appengine/15024 https://status.cloud.google.com/incident/appengine/17003).
I don't want to store these flags in memcache nor Datastore for efficiency and costs.
I'm looking for a way to send a message to all instances (see my previous post GAE send requests to all active instances ):
As stated in https://cloud.google.com/appengine/docs/standard/python/how-requests-are-routed
Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be an integer in the range from 0, up to the total number of instances running. Regardless of your scaling type or instance class, it is not possible to send a request to a specific instance without targeting a service or version within that instance.
but another solution could be:
1) Send a shutdown message/command to all instances of my app or a service
2) Send a restart message/command to all instances of my app or service
I use only automatic scaling, so I'cant send a request targeted to a specific instance (I can get the list of active instances using GAE admin API).
it's there any way to do this programmatically in Python GAE? Manually in the GCP console it's easy when having a few instances, but for 50+ instances it's a pain...
One possible solution (actually more of a workaround), inspired by your comment on the related post, is to obtain a restart of all instances by re-deployment of the same version of the app code.
Automated deployments are also possible using the Google App Engine Admin API, see Deploying Your Apps with the Admin API:
To deploy a version of your app with the Admin API:
Upload your app's resources to Google Cloud Storage.
Create a configuration file that defines your deployment.
Create and send the HTTP request for deploying your app.
It should be noted that (re)deploying an app version which handles 100% of the traffic can cause errors and traffic loss due to:
overwriting the app files actually being in use (see note in Deploying an app)
not giving GAE enough time to spin up sufficient instances fast enough to handle high income traffic rates (more details here)
Using different app versions for the deployments and gradually migrating traffic to the newly deployed apps can completely eliminate such loss. This might not be relevant in your particular case, since the old app version is already impaired.
Automating traffic migration is also possible, see Migrating and Splitting Traffic with the Admin API.
It's possible to use the Google Cloud API to stop all the instances. They would then be automatically scaled back up to the required level. My first attempt at this would be a process where:
The config item was changed
The current list of instances was enumerated from the API
The instances were shutdown over a time period that allows new instances to be spun up and replace them, and how time sensitive the config change is. Perhaps close on instance per 60s.
In terms of using the API you can use the gcloud tool (https://cloud.google.com/sdk/gcloud/reference/app/instances/):
gcloud app instances list
Then delete the instances with:
gcloud app instances delete instanceid --service=s1 --version=v1
There is also a REST API (https://cloud.google.com/appengine/docs/admin-api/reference/rest/v1/apps.services.versions.instances/list):
GET https://appengine.googleapis.com/v1/{parent=apps/*/services/*/versions/*}/instances
DELETE https://appengine.googleapis.com/v1/{name=apps/*/services/*/versions/*/instances/*}
So I started to play with Google Mobile Backend Starter.
Now I want to clean this instance and anything that this starter thingy created into the project (e.g. task queues, data store, etc...)
How do we achieve this?
Is this done through some command line somewhere along the lines described in this page?
appengine-java-sdk/bin
EDIT: I should have made it clearer, I don't intend to delete the project. I just want to "clean" it and replace with my own application. Anyway, I ended up using the appengine SDK tools mentioned above (Updating and Managing a Java App). It was a long process, and tedious. It could be improved.
What I did:
Using the appengine SDK tool, downloaded the application first. It prompted me for a password. I had to create a new "App Password" entry in my Google account, since it didn't accept my "usual" Google password (e.g. GMAIL)
To clean the Scheduled Tasks, edit cron.xml so that no entries are left. A sample empty cron.xml file is shown in the documentation page. Run the update procedure of the SDK tool for cron jobs
To clean the Queues, use the same approach (queue.xml)
Clean DataStore by going to DataStore Admin page (if applicable)
Upload your new application (either thru SDK, or thru Android/Eclipse AppEngine plugin)
There will now be 2 (or more) versions of your AppEngine. If necessary, make your newly uploaded application, the default version. This is done in the Developer Console.
Check the instances as well. Remove if necessary
I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.