How to scale pull queues with Google Cloud Tasks

How to scale pull queues with Google Cloud Tasks - google-app-engine

I have a GAE/P/Standard/FirstGen app that sends a lot of email with Sendgrid. Sendgrid sends my app a lot of notifications for when email is delivered, opened, etc.
This is how I process the Sendgrid notifications:
My handler processes the Sendgrid notification and adds a task to a pull queue
About once every minute I lease a batch of tasks from the pull queue to process them.
This works great except when I am sending more emails than usual. When I am adding tasks to the pull queue at a high rate, the pull queue refuses to lease tasks (it responds with TransientError) so the pull queue keeps filling up.
What is the best way to scale this procedure?
If I create a second pull queue and split the tasks between the two of them, will that double my capacity? Or is there something else I should consider?
====
This is how I add tasks:
q = taskqueue.Queue("pull-queue-name")
q.add(taskqueue.Task(data, method="PULL", tag=tag_name))

I have found some information about it in Google documentation here. According to it solution for TransientError should be to:
catch these exceptions, back off from calling lease_tasks(), and then
try again later.
etc.
Actually I suppose this is App Engine Task queue, not Cloud Tasks which are different product.
According to my understanding there is no option to scale this better. It seems that the some solution might be to migrate to Cloud Task and Pub/Sub which is better way to manage queues in GAE as you may find here.
I hope it will help somehow... :)

Related

Status of the topic

I have watch/subscribed to the topic using the following code.
request = {
'labelIds': ['INBOX'],
'topicName': 'projects/myproject/topics/mytopic'
}
gmail.users().watch(userId='me', body=request).execute()
How can I get the status of the topic at any given point in time? The problem is, sometimes I am not getting the push from Gmail for any incoming emails.

From the Cloud Pub/Sub perspective, if you want to check on the status of messages, you could look at metrics via Stackdriver. There are many Cloud Pub/Sub metrics that are available. You can create graphs on any of the metrics that will be mentioned later by going to Stackdriver, creating a new dashboard, clicking on "Add Chart," and then typing in the name of the metric in the "Find resource type and metric box:
The first thing you have to determine is whether the issue is on the publish side (from Gmail into your topic) or on the subscribe side (from the subscription to your push endpoint). To determine if the topic is receiving messages, look at the topic/send_message_operation_count metric. This should be non-zero at points where messages were sent from Gmail to the topic. If it is always zero, then it is likely that the connection from Gmail to Cloud Pub/Sub is not set up properly, e.g., you need to grant publish rights to the topic. Note that results are delayed, so from the time you expect a message to have been sent to when it would be reflected on the graph could be up to 5 minutes.
If the messages are successfully being sent to Pub/Sub, then you'll want to see the status of attempts to receive those messages. If your subscription is a push subscription, then you'll want to look at subscription/push_request_count for the subscription. Results are grouped by response code. If the responses are in the 400 or 500 ranges, then Cloud Pub/Sub is attempting to deliver messages to your subscriber, but the subscriber is returning errors. In this case, it is likely an issue with your subscriber itself.
If you are using the Cloud Pub/Sub client libraries, then you'll want to look at properties like subscription/streaming_pull_message_operation_count to determine if your subscriber is managing to try to fetch messages for a subscription. If you are calling the pull method directly in your subscriber, then you'll want to look at subscription/pull_message_operation_count to see if there are pull requests returning successfully to your subscriber.
If the metrics for push, pull, or streaming pull indicate errors, that should help to narrow down the problem. If there are no requests at all, then it indicates that the subscribers may not There could be permission problems, e.g., the subscriber is running as a user that doesn't have permission to read from subscriptions.

Channel API overkill?

Hi I am currently using channel API for my project. My client is a signage player which receives data from app engine server only when user changes a media content. Appengine sends data to client only ones or twice a day. Do you think channel api is a over kill for this? what are some other alternatives?

Overall, I'd think not. How many clients will be connected?
Per https://cloud.google.com/appengine/docs/quotas?hl=en#Channel the free quota is 200 channel-hours/day, so if you have no more than 8 clients connected you'll be within the free quota -- no "overkill".
Even beyond that, per https://cloud.google.com/appengine/pricing , there's "no additional charge" beyond the computational resources keeping the channel open entails -- I don't have exact numbers but I don't think those resources would be "overkill" compared with alternatives such as reasonably frequent polling by the clients.

According to the Channel API documentation (https://cloud.google.com/appengine/features/#channel), "The Channel API creates a persistent connection between an application and its users, allowing the application to send real time messages without the use of polling.". IMHO, yours might not the best use case for it.
You may want to take a look into the TaskQueue API (https://cloud.google.com/appengine/features/#taskqueue) as an alternative of sending data from AppEngine to the client.

How to auto-start an App-Engine backend when a pull-queue has tasks?

It looks like I can create a push-queue that will start backends to process tasks and that I can limit the number of workers to 1. However, is there a way to do this with a pull-queue?
Can App-Engine auto-start a named backend when a pull-queue has tasks and then let it expire when idle and the queue is empty?
It looks like I just need some way to call an arbitrary URL to "notify" it that there are tasks to process but I'm unable to find any documentation on how this can be done.

Use a cron task or a push queue to periodically start the backend. The backend can loop through the tasks (if any) in the pull queue, and then expire.
There isn't a notification system for pull queues, only inspection through queue statistics and empty/non-empty lease results.

First of all you need to decide scalability type you want to use for your module. I think that you should take a look to Basic Scaling (https://developers.google.com/appengine/docs/java/modules/)
Next, to process tasks from pull queue you can use Cron to check queues every several minutes. It will be important to request not basic scaling module, but frontend module, cause cron will start instances. The problem is that you will still need to pay for at least one instance of frontend cause your cron job will not allow it to shutdown.
So the solution could be the following:
Start cron every 1 or 5 minutes and call frontend
Check queue in frontend and issue URLFetch request to basic scaling module if there are tasks in pull queue
Process tasks in queue using basic scaling module
If you use F1 instances for frontend and b2 or greate for other modules it could save you some money.

Google App Engine - Cron or Task Queue?

I'm building a simple "play against a random opponent" back-end using Goole App Engine. So far I'm adding each user that wants to play into a "table" in the Datastore. As soon as there are more than 1 player in the Datastore, I can start to match them.
The Schedule Tasks with Cron looked promising for this work until I saw that the lowest resolution seems to be minutely. If there are plenty of players signing up I want them to be matched quickly and not have to wait a whole minute (worst case).
I thought about having the servlet that recives the "play against random opponent" request POST to a Task Queue that would do the match making, but I think this will lead to a lot of contention when reading from the Datastore and deleting the enteties from the "random" table after they have been matched?
Basically I want a single worker that will do the matching, and I want to signal this worker from time to time that now is a good time to try to match opponents.
Any suggestions on what would be the right course of action here?

You can guarantee exclusive access via transactions:
Receive a request to play via REST. Check (within a transaction) if there is any request in database.
If there is, notify both users to start the play and delete request (transactionaly) from database.
If there isn't, add it to the database and wait for the next request.
Update:
Alternativelly you can achieve what you need via pull queue. Same scenario as above, just instead of datastore you'd check if there is a task in the pull queue, retrieve if there is or create a new one if there isn't one.

bulk email with GAE and per minute quota

I'm developing a voting application with GAE, which involves sending email to each voter. In my initial tests, I went over the per-minute email quota, and this exception was raised:
OverQuotaError: The API call mail.Send() required more quota than is available.
I was able to solve this short term by enabling billing, which greatly increases the per minute email quota, but what is the right way to prevent such an exception from being raised in the future? If my app becomes wildly successful and I exceed the larger quota, it would be a big problem to have this exception raised.
I don't want to put the call to send emails in a try, except block, since this is being done after processing a form, and I don't want the user to wait around for the response to the POST.
Is this a good use case for a task queue? If so, would I put a request to send a batch of emails in the task queue or would each request to send an email go in the task queue? The former seems better in that processing the POST would be faster. Regardless of which way I do it, would I add a delay between sending each email to ensure they are not sent to fast and I go over quota?

yes, ideally suited to the task queue as you can limit the rate at which your emails are sent out by changing the properties in the queue.yaml
one email per task would be best, so if the task fails and is retried it will only retry the failed one not all of the batch

yes. use a task queue. if a task is sending a email, you can decide how many tasks should run per minute. and if a task failed it will retry to execute.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight