I'm wondering how to set up logging for Google App Engine cron jobs. I haven't found any information about this specific topic in the App Engine documentation.
There's a page https://console.cloud.google.com/appengine/cronjobs in GCP. Every cron job has a "View" link in the "Log" column, which leads a user to the Logs Viewer with the following filters:
protoPayload.taskName="..."
protoPayload.taskQueueName="__cron"
In my case, no logs for cron jobs are displayed.
The service that serves the endpoints for the cron jobs is a node.js application that uses Winston logging with the transport provided by #google-cloud/logging-winston package. This application is responsible not only for processing cron jobs, and the logging works there fine: for instance, I'm able to filter specific queries by Google's trace id.
Is there anything I can provide with the logs payload to be able to filter them by taskName and taskQueueName? And where would I take these values, i.e. are there any request headers I could read them from and write with logs?
It would be great if it's something achievable with #google-cloud/logging-winston. If not, a library/language agnostic answer would also be helpful.
Related
I have backuped datastore via cron using cron.yaml like following
- description: My Daily Backup
url: /_ah/datastore_admin/backup.create?name=BackupToCloud&kind=LogTitle&kind=EventLog&filesystem=gs&gs_bucket_name=whitsend
schedule: every 12 hours
target: ah-builtin-python-bundle
But
According to google announcement, datastore-admin will go to "deprecated".
https://cloud.google.com/datastore/docs/console/datastore-backing-up-restoring
How to backup datastore via cron without datastore_admin?
https://cloud.google.com/appengine/articles/scheduled_backups
says only about using gcloud.
Note that just the backup/restore functionality based on the datastore-admin will be deprecated, not the datastore-admin itself.
The deprecation note points to the Managed export and import service as the recommended replacement alternative.
Exports based on this method can also be scheduled, see Scheduling an Export. You'll note in that article that a standard env GAE app with a cron service is exactly what the method is based on.
The article is targeted at those apps using the Datastore outside of GAE. Since you already have a GAE app you can just modify your existing backup cron job handler following the example in the article or, if you want to separate it a bit from your main app, you can add a separate service to your app, dedicated to the backup cron job.
I have to maintain a database on the Google Cloud Platform and along with it put in a script(preferably in python) that is automated to put in new values from an API on a daily basis.
I'm confused as to how to go about this. Any suggestions?
You can take advantage of the App Engine platform which allow you to deploy a python application. It can be set to simply await instructions from your API or fetch the information directly. With the help of CRON, you can schedule task that should take care of pushing the object within your Database.
Another option would be the Cloud Functions. Currently Cloud Functions only handles the Nodejs runtime but it allows you to run a backend application that only runs when triggered. With a simple HTTP trigger from your API, your function should handle the data received and organize it before storing it in your Database.
Other options are available like Cloud Endpoints, Database (Spanner, Cloud SQL, Cloud PostgreSQL, Bigtable,) API, etc. All depends of semantics of your project (Will it be run only once daily, how fast does the whole operation has to be completed, etc.). I would suggest to review all of Google CLoud products in order to find the right solution for you.
I'm attempting to kick of a dataflow job via an (already existing) AppEngine application. The DataFlow job reads data generated by the GAE application stored in DataStore and writes the processed data to BigQuery. I'm receiving the following error.
java.lang.SecurityException: Google App Engine does not support Runtime.addShutdownHook
at com.google.appengine.runtime.Request.process-a010d936cef53bc8(Request.java)
at java.lang.Runtime.addShutdownHook(Runtime.java:46)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors$Application.addShutdownHook(MoreExecutors.java:232)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors$Application.addDelayedShutdownHook(MoreExecutors.java:204)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors$Application.getExitingExecutorService(MoreExecutors.java:188)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors.getExitingExecutorService(MoreExecutors.java:89)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.<clinit>(BigQueryTableInserter.java:79)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.verifyTableEmpty(BigQueryIO.java:886)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.apply(BigQueryIO.java:942)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.apply(BigQueryIO.java:724)
at com.google.cloud.dataflow.sdk.runners.PipelineRunner.apply(PipelineRunner.java:74)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.apply(DataflowPipelineRunner.java:327)
at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:367)
at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:274)
at com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:161)
Is there a way to enable writing to a BigQuery table in dataflow when being kicked off via GAE? I am setting the runner to DataflowPipelineRunner, so it shouldn't be attempting to run it on the GAE instance itself. (Is there a way to verify that?)
DataflowPipelineRunner is attempting to validate your pipeline prior to job submission to the Google Cloud Dataflow service. In this stack trace, we attempt to verify that the destination BigQuery table is empty.
During this process, we initialize an ExecutorService, which is not allowed to run in Google App Engine. This is unfortunate, as this is not strictly needed in this scenario. The fix for this is tracked as BEAM-142, please check there for any updates.
A workaround would be to disable validation in the App Engine environment. Use BigQueryIO.Write.withoutValidation() in your BigQuery sink.
You can try App Engine Flexible Environment instead which is not as restrictive as App Engine Standard in terms of allowed JRE classes.
I am writing a wrapper around the Github Issues API to allow managers in my company to set up daily reminder emails to be sent to their devs. I want this to be configurable through an admin console, and give them the flexibility of setting up reminders at any time of day and any number of times a day.
The main App Engine cron system is configured statically through the cron.yaml file and cannot be changed by user action. Looking at the documentation it appears like I can only do this by reimplementing an entire cron infrastructure on top of the basic App Engine cron. Am I missing something? Is there anything like this that is already available elsewhere?
You are correct, you cannot setup programmatically the cron configuration.
You can configure a single cron which triggers a customized functions. This functions can read the configured crons (like datastore entities) and the launch different task based on your needs
I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.