How to scale Bulk Scheduled tasks on a Timeline in GCP? - google-app-engine

We have microservice to deliver notification on a timeline with multiple channels(Push Notification, Email, SMS).
So we persist every scheduled notification in a DB and run a CRON for every 3 hours to query notifications scheduled to deliver within the next 3 hours. in which we push the result into PUSH Queue(Cloud Tasks) to run on time.
This solution works well and good, but when no. of the tasks increase eg: 5k task at the same time, PUSH QUEUE dispatching with delay.
Though I try to tweak the PUSH QUEUE dispatch rate and concurrency. The HTTP handlers are not scaling to process such level traffic instantly(GAE scales gradually). So it delivers the notifications with delay.
The reason we do this way to support.
Cancellation/Content update on the scheduled task at any time before it delivers.
So What's the optimal way to run all the scheduled tasks(like 5k to 20k) on the timeline without delay?
Thanks in advance.

Related

Continuously running service in Google Cloud Engine

I am trying to figure out how to run a service(1) when it does not receive any calls.
I want to use Microservices Architecture.
Basically i want to run this service (1) when the other service(2) is receiving calls and all data.
As the service(1) i mentioned is not receiving it would not have to spawn new instances and i would want only the service(2) to scale.
I have noticed scheduling jobs with cron yaml but the number of calls is limited.
I need to get this service(1) to be active every 1 min when service(2) is active.
It's hard to give a good answer without knowing more about what service (1) has to do when it is 'active'. It sounds you want cron to launch a task every minute.
You can use cron in conjunction with push queues: https://cloud.google.com/appengine/docs/standard/go/taskqueue/push/
When creating a push queue task, you can set the property delay before adding it to the queue: https://cloud.google.com/appengine/docs/standard/go/taskqueue/reference#Task
(For me in Python they called it countdown https://cloud.google.com/appengine/docs/standard/python/refdocs/google.appengine.api.taskqueue.taskqueue#google.appengine.api.taskqueue.taskqueue.add)
You could have a cron job that fires every 24 hrs. That cron job would load up your push queue with tasks who's delays are staggered. The delay of the first one is 1 min, the delay of the second one is 2 min, etc.

Google Appengine: How to schedule a task to run once at a time of the day

I am faced with a situation where I want a user to perform an action and have the option to revert the action before the next 24 hours, else the action would be executed. The only solution I have been able to come up with is to use a Cron Job and schedule it for a particular time of the day and in the Job I would check for all actions that their scheduled time has passed then execute them. But the action does not happen very often and thus having a Cron job running does not appear to be a good solution to me as I am not even sure of the cost implication.
What I want to do is that whenever a user clicks on the action a Job should be scheduled and once that action is executed the schedule should be cancelled. Is it possible to do this with Cron Job? If no what alternative does GAE provide?
When a request is scheduled by a cron job it's handled as one normal request, from the quota/billing perspective. I suppose your applications gets much more requests per day, so plus one request shouldn't be a matter, unless your application is quite heavyweight.
I'd prefer cron over deferred tasks, because the latter is more convoluted. A cron job most likely would query the datastore then do something or not. It's easier to keep track or manage the state of the datastore than keep track of deferred tasks.
You have two options:
(1) Run a cron-job once per hour (for example). Execute all actions that have more than 24 hours since time of their creation.
(2) When an action is stored, create a task using a DeferredTask API. Give this task a name (e.g. an ID of an action), in case you need to cancel it. Add this task to a queue with a delay of 24 hours. Java example:
Queue queue = QueueFactory.getDefaultQueue();
// Wait 24 hours to run
queue.add(TaskOptions.Builder.withPayload(new MyTask())
.name(taskName).etaMillis(System.currentTimeMillis() + (24 * 60 * 60 * 1000));

Custom Metrics cron job Datastore timeout

I have written a code to write data to custom metrics cloud monitoring - google app engine.
For that i am storing the data for some amount of time say: 15min into datastore and then a cron job runs and gets the data from there and plots the data on the cloud monitoring dashboard.
Now my problem is : while fetching huge data to plot from the datastore the cron job may timeout. Also i wanted to know what happens when cron job fails ?
Also Can it fail if the number of records is high ? if it can, what alternates could we do. Safely how many records cron could process in 10 min timeout duration.
Please let me know if any other info is needed.
Thanks!
You can run your cron job on an instance with basic or manual scaling. Then it can run for as long as you need it.
Cron job is not re-tried. You need to implement this mechanism yourself.
A better option is to use deferred tasks. Your cron job should create as many tasks to process data as necessary and add them to the queue. In this case you don't have to redo the whole job - or remember a spot from which to resume, because tasks are automatically retried if they fail.
Note that with tasks you may not need to create basic/manual scaling instances if each task takes less than 10 minutes to execute.
NB: If possible, it's better to create a large number of tasks that execute quickly as opposed to one or few tasks that take minutes. This way you minimize wasted resources if a task fails, and have smaller impact on other processes running on the same instance.

Scheduling cron jobs

I want to develop an app on which a user can register for alerts( multiple) so that whenever the fare hits below some threshold, he gets a notification. Fares are fetched from a third party website.I want to do this on google app-engine.
Now from what i understand , i need a process running 24/7 which checks the fares at say intervals of 30 mins and send out a notification whenever it hits below the threshold. Probably the cron job of app-engine can be used for this task ? But at max 100 cron jobs can be scheduled, what would be the better way to this. Also having a process for each user would be wastage of resources, what would be better scheduling algorithms for higher efficiency ?
You want to schedule a single cron that runs every 30 minutes and throws an item onto a task queue. That single item on the task queue would then be able to go through all your users, and generate tasks to fetch whatever you need in the background again. Two important things:
You want the initial cron call to return as quickly as possible, as URLs have a 60 second deadline.
Split up any work into separate task queues to achieve above and also iterate through data sources and/or users.
Based on what you're explaining, you can use push task queues: https://cloud.google.com/appengine/docs/python/taskqueue/overview-push

How Google App Engine Java Task Queues can be used for mass scheduling for users?

I am focusing GAE-J for developing a Java web application.
I have a scenario where user will create his schedule for set of reminders. And I have to send emails on that particular date/time.
I can not create thread on GAE. So I have the solution of Task Queues.
So can I achieve this functionality with Task Queues. User will create tasks. And App Engine will execute it on specific date and time.
Thanks
Although using the task queue directly, as Chris suggests, will work, for longer reminder periods (eg, 30+ days) and in cases where the reminder might be modified, a more indirect approach is probably wise.
What I would recommend is storing reminders in the datastore, and then taking one of a few approaches, depending on your requirements:
Run a regular cron job (say, hourly) that fetches a list of reminders coming up in the next interval, and schedules task queue tasks for each.
Have a single task that you schedule to be run at the time the next reminder (system-wide) is due, which sends out the reminder(s) and then enqueues a new task for the next reminder that's due.
Run a backend, as Chris suggests, which regularly scans the datastore for upcoming reminders.
In all the above cases, you'll probably need some special case code for when a user sets a reminder in less than the minimum polling interval you've set - probably enqueuing a task directly. You'll also want to consider batching up the sending of reminders, to minimize tasks and wallclock time consumed.
You can do this with Task Queues - basically when you receive the request 'remind me at date/time X by sending an email', you create a new task with the following basic structure:
if current time is close to or past the given date/time X:
send the email
else
fail this task
If the reminder time is far in the future, the first few times the task is scheduled, it will fail and be scheduled for later. The downside of this approach is that it doesn't guarantee that the task will run exactly when the reminder is supposed to be sent - it may be a little while before or afterwards. You could slim down this window by taking into account that your task can run for 10 minutes, so if you're within 10 minutes of the reminder time, sleep until the right time and then send the e-mail.
If the reminders have to be sent out as close in time as possible then just use a Backend - keep an instance running forever and dispatch all reminders to it, and it can continuously look at all reminders it has to send out and send them out at exactly the right time.

Resources