GCP Cloud Tasks are defined per project or service? - google-app-engine

I'm creating a new service in App Engine Python 3 using a Cloud Task queue. The project already has a service using python 2 and declaring the queues in a queue.yaml file.
According with the documentation I can't mix the queue's creation using queue.yaml and Cloud Tasks API, so I'll create another yaml file for the new service.
My question is if the new queue.yaml will overwrite the existing queues or I can declare distinct queues for each service.

The queue.yaml file is global to the project. On app engine you can have only one. If you create a new one and you omit the previous one, the previous will be paused or deleted.
If you use Cloud Task, don't forget to add the existing task queue in the queue.yaml file for preventing the pause or the deletion of the Cloud Task queue.
My recommendation: forget the queue.yaml file and switch to the Cloud Task. queue.yaml file belong to the old version. Deprecated a day. Bet on the future!

Related

Run default script without accepting request in App Engine Standard Environment

I'm having a nodejs script which starts a stream with a third party and stores the incoming messages in FireStore.
There is no need for incoming requests. But after I deployed my script to App Engine, the script only starts if I call the cloud endpoint. After that, it keeps running (and that is what it should do).
Probably there is a way to start processes by default and also build in something like a auto-restart if it crashes, but I couldn't find it or I am using the wrong search terms :-)
AppEngine is a web-microservice platform. I mean that all (micro) service deployed have to be triggered by an HTTP request.
By the way, you can perform an infinite batch process which stream data.
However, you can set up a Cloud Task which call an AppEngine endpoint. The max duration is 24H. Link this to a Cloud Scheduler to launch every day your 24H-long task. (In detail, your cloud scheduler has to trigger an endpoint like Cloud Function or AppEngine. This endpoint creates the task in Cloud Task. Cloud Scheduler can't directly create a task in Cloud Task)
As Guillaume mentioned, GAE isn't really intended for implementing services like the ones you want to.
However, it's possible to do something similar, simply by configuring a minimum 1 idle instance:
GAE will start an idle instance for the service automatically, without waiting for a triggering request
when the idle instance dies accidentally or is terminated because it reaches the end of its allowed lifespan GAE will again start a new idle instance
when the 1st request comes in GAE will dispatch it to the idle instance, that instance thus becoming active (serving subsequent requests) and GAE will immediately start a new idle instance to have it on standby
when the only active instance dies GAE won't start a new instance immediately, it'll wait until a new request comes in, which will be like the 1st request
when traffic is high enough GAE will start dispatching it to the idle instance on standby activating it and again start a new idle instance on standby.

deployment of queue.xml to new non-default version does not create queues

I am trying to use task queues in GAE Java 8, but somehow it does not seem to deploy correctly via the queue.xml file. I can also not see the task queues in the Cloud Tasks console (which is where I get redirected from the app engine console).
I get an error java.lang.IllegalStateException: The specified queue is unknown : xxxxx when running the app.
The app runs fine locally. I can see the task queues appearing locally in the admin page.
Does this mean that I cannot deploy task queues via queue.xml anymore?
You should be aware that the queue configuration is not a per-version config (or even a per-service one!), it is a global, per-application config. Or per-project if you want - considering that there can only be one GAE application per GCP project.
This single queue configuration is shared by all versions of all services of your application, so:
if/when services/versions need different queue configs all of them need to be merged into a single file for deployment.
pay attention at deployment not to overwrite/negatively impact existing services/versions
While in some cases the queue.xml file might be deployed automatically when you deploy your application code it is not always the case. The official recommended deployment method is using the deployment command dedicated for the queue configuration, which can be performed independently from deploying application/service code. From Deploying the queue configuration file:
To deploy the queue configuration file without otherwise altering the
currently serving version, use the command:
appcfg.sh update_queues <application directory>
replacing <application directory> with the path to your application
main directory.
Pay extra attention if you have:
other non-java standard environment services in your app - they use the queue.yaml queue configuration file, managing the same content in 2 different files/formats can be tricky
other services managing queue via using Cloud Tasks. See Using Queue Management versus queue.yaml.

Communication methods between two GAE services in the same project

On Google App Engine, when a project has more than one service and these services in the same project need to communicate to one another, is there any way to send a message to another service for calling function apart from using URLFetch api?
One option is to use Task Queues.
The queue definitions are an app-level configuration applicable to all services/modules. So tasks can be enqueued into any queue by any service/module and each queue can be targeted to (serviced by) a specific service/module.

Kicking off Dataflow Jobs with App Engine errors with a SecurityException on addShutdownHook for BigQueryTableInserter

I'm attempting to kick of a dataflow job via an (already existing) AppEngine application. The DataFlow job reads data generated by the GAE application stored in DataStore and writes the processed data to BigQuery. I'm receiving the following error.
java.lang.SecurityException: Google App Engine does not support Runtime.addShutdownHook
at com.google.appengine.runtime.Request.process-a010d936cef53bc8(Request.java)
at java.lang.Runtime.addShutdownHook(Runtime.java:46)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors$Application.addShutdownHook(MoreExecutors.java:232)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors$Application.addDelayedShutdownHook(MoreExecutors.java:204)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors$Application.getExitingExecutorService(MoreExecutors.java:188)
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.util.concurrent.MoreExecutors.getExitingExecutorService(MoreExecutors.java:89)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.<clinit>(BigQueryTableInserter.java:79)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.verifyTableEmpty(BigQueryIO.java:886)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.apply(BigQueryIO.java:942)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound.apply(BigQueryIO.java:724)
at com.google.cloud.dataflow.sdk.runners.PipelineRunner.apply(PipelineRunner.java:74)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.apply(DataflowPipelineRunner.java:327)
at com.google.cloud.dataflow.sdk.Pipeline.applyInternal(Pipeline.java:367)
at com.google.cloud.dataflow.sdk.Pipeline.applyTransform(Pipeline.java:274)
at com.google.cloud.dataflow.sdk.values.PCollection.apply(PCollection.java:161)
Is there a way to enable writing to a BigQuery table in dataflow when being kicked off via GAE? I am setting the runner to DataflowPipelineRunner, so it shouldn't be attempting to run it on the GAE instance itself. (Is there a way to verify that?)
DataflowPipelineRunner is attempting to validate your pipeline prior to job submission to the Google Cloud Dataflow service. In this stack trace, we attempt to verify that the destination BigQuery table is empty.
During this process, we initialize an ExecutorService, which is not allowed to run in Google App Engine. This is unfortunate, as this is not strictly needed in this scenario. The fix for this is tracked as BEAM-142, please check there for any updates.
A workaround would be to disable validation in the App Engine environment. Use BigQueryIO.Write.withoutValidation() in your BigQuery sink.
You can try App Engine Flexible Environment instead which is not as restrictive as App Engine Standard in terms of allowed JRE classes.

Using amazon web services as google app engine back end

I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.

Resources