I need to run some specific code that can't be run on Google AppEngine (because of restrictions).
Since these workers are asynchronous, I thought about launching a Compute instance every time I need it and connecting them via a specific Tasks via the Task Queue from Google AppEngine, but I can not find any documentation about if this is possible?
TL;DR: Is it possible to specify a Google Compute as instance for a Task queue?
No, there is no way to specify a Google Compute as instance for a Task queue.
But did you consider using the Flexible environment (eventually with a custom runtime to try to address the restrictions) instead? Or the alternatives suggested for the Flexible env (only has limited task queue support) From Task Queue:
The Task Queue service has limited availability outside of the
standard environment. If you want to use the service outside of the
standard environment, you can sign up for the Cloud Tasks alpha.
Outside of the standard environment, you can't add tasks to push
queues, but a service running in the flexible environment can be
the target of a push task. You can specify this using the
target parameter when adding a task to queue or by specifying
the default target for the queue in queue.yaml.
In many cases where you might use pull queues, such as queuing up
tasks or messages that will be pulled and processed by separate
workers, Cloud Pub/Sub can be a good alternative as it offers
similar functionality and delivery guarantees.
Related
As part of migrating my Google App Engine Standard project from python2 to python3, it looks like I also need to switch from using the Taskqueue API & Library to google-cloud-tasks.
In the taskqueue library I could enqueue upto 100 tasks at a time like this
taskqueue.Queue('default').add([...task objects...])
as well as enqueue tasks asynchronously.
In the new library as well as the new API, it looks like you can only enqueue tasks one at a time
https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/create
https://googleapis.dev/python/cloudtasks/latest/gapic/v2/api.html#google.cloud.tasks_v2.CloudTasksClient.create_task
I have an endpoint where it receives a batch with thousands of elements, each of which need to get processed in an individual task. How should I go about this?
According to the official documentation (reference 1, reference 2) the feature of adding task to queues asynchronously (as this post suggests for adding bulk number of tasks to a queue), is NOT an available feature via Cloud Tasks API. It is available for the users of App Engine SDK though.
However, there is a reference in the documentation regarding adding a large number of Cloud Tasks to a queue via double-injection pattern workaround (this post might seem useful too).
To implement this scenario, you'll need to create a new injector queue, whose single task would contain information to add multiple(100) tasks of the original queue that you're using. On the receiving end of this injector queue would be a service which does the actual addition of the intended tasks to your original queue. Although the addition of tasks in this service will be synchronous and 1-by-1, it will provide an asynchronous interface to your main application to bulk add tasks. In such a way you can overcome the limits of synchronous, 1-by-1 task addition in your main application.
Note that the 500/50/5 pattern of task addition to queue is a suggested method, in order to avoid any (queue/target) overloads.
As I did not find any examples of this implementation, I will edit the answer as soon as I find one.
Since you are in a migration process, I figured out that this link would be useful, as it concerns migrating from Task Queue to Cloud Tasks (as you stated you are thinking to do).
Additional information on migrating your code with all the available details you can find here and here, regarding Pull queues to Cloud Pub/Sub Migration and Push queues to Cloud Tasks Migration correspondingly.
In order to recreate a batch pull mechanism, you would have to switch to Pub/Sub. Cloud Tasks does not have pull queues. With Pub/Sub you can batch push and batch pull messages.
If you are using a push queue architecture, I would recommend passing those elements as the task payload; however the max task size is limited to 100kb.
We have a project where 2 datasets(kinds) are stored in google datastore having 1.1 million records together. We are also planning to add more datasets moving forward. Now we are thinking to move to app engine flex so that statistical libraries such as numpy, pandas and ML framework Scikit-learn can be utilized to build predictive models. As part of data transformation/computation pandas and numpy will be used to extract new features out of the datasets stored in the google datastore.
Question - what is the effective approach to execute the computation logic on large datasets which involves data aggregation and transformation in the google app engine flex environment. Initial i was thinking of using task queue to do this heavy duty transformation considering it has 10 min timeout but not sure if that is feasible in flex environment
The trouble is that task queues have limited support in the flex environment. From Migrating Services from the Standard Environment to the Flexible Environment:
Task Queue
The Task Queue service has limited availability outside of the
standard environment. If you want to use the service outside of the
standard environment, you can sign up for the Cloud Tasks alpha.
Outside of the standard environment, you can't add tasks to push
queues, but a service running in the flexible environment can be
the target of a push task. You can specify this using the
target parameter when adding a task to queue or by specifying
the default target for the queue in queue.yaml.
In many cases where you might use pull queues, such as queuing up
tasks or messages that will be pulled and processed by separate
workers, Cloud Pub/Sub can be a good alternative as it offers
similar functionality and delivery guarantees.
One approach is already mentioned in the above quote: using Cloud Pub/Sub.
Another approach is also hinted at in the quote:
keep part of the existing app as a standard env service/module, populating the datasets and pushing processing tasks into push task queues
use the flex environment in the processing service(s)/module(s) where you need to use those libraries. These would be specified as targets for those pushed tasks.
The Google App Engine developer console allows you to easier monitor instantaneous queue size for an app. How can you simply view queue size over time?
For context: the backend process off our application runs through a fairly restrictive queue, as front end availability is a priority (and it's currently a free app). What I'd like to monitor is the size of the task queue over time which would give me a good proxy of the backlog of work.
I could set up a process just to log this directly, and then a separate page to the graph it, however this seems a little involved for something that may be already easily available either as a graph, or at least a queriable data-series direct from the app engine.
Thanks to #tx802 for help with this answer:
It's not currently simple to view these metrics. The process to setting them up however is:
Set up a simple CRON job to read the QueueStatistics object for the given queue on whatever time basis is interesting (I chose every 5 minutes).
Use the Custom Metrics function to store the value as a custom metric which you can then pull up in the Cloud Monitoring Dashboard.
I want to handle user data in queue with some external program. Is it possible? First of all i must upload and install this software somewhere... Then run it within queue environment. How i can do it?
If it is impossible on GAE, can you advice me another cloud platform with ability to run programs with queue interface, like for example OGE or something else..?
I guess you need both access to GAE Datastore (to load user data) and ability to execute external program (to analyse it)?
At the moment GAE does not allow executing arbitrary programs, so answer is no.
However, there is an upcoming feature called VM-based backends, which will allow you to start Compute Engine instances (with ability to run arbitrary programs) and have those instances access the GAE Datastore. This is at the moment a trusted-tester feature (a limited beta), I guess it'll be available in a couple of months.
We have a GAE application that notifies users of sports goal changes and we want to notify them as fast as possible. Currently we create one entry in a GAE queue for each notification and this works fairly well. but lately it struggles a bit with sending 2000-3000 notifications at one score change.
What is the optimal queue configuration to send these events as fast as possible?
Currently we have:
<queue>
<name>c2dm</name>
<rate>10/s</rate>
</queue>
and in appengine-web.xml
<threadsafe>true</threadsafe>
You can increase the rate and bucket size to get the push queue to execute tasks faster. You can also leave max-concurrent-requests unset (unlimited). This will drive tasks up to the maximum number of instances you have configured, times the throughput of a single instance of your app. You can also increase your instance count maximum.
Note that "as fast as possible" might not be the ideal. You'll want to tweak these values based on the burst patterns, instance warm-up costs, datastore contention error rates, etc. until you're getting comfortable performance. If you need even more control, consider using pull queues, either with an App Engine backend instance (or several) or a notifications service running on another platform that calls the pull queues REST API.