Far in Future GAE Task Queue ETA - google-app-engine

Are there any risks/cautions around creating tasks for a GAE Push Task Queue in say 1 month from now or even 1 year from now?

According to the documentation the maximum ETA for a task is 30 days.
The biggest risk for long-in-the-future tasks is that when this future finally arrives, you may no longer need this task. For example, a customer can close his account or you can release a new version of your software that is not compatible with the scheduled task. In fact, I cannot think of a use case where nothing can go wrong when a task is scheduled for 1 year into the future.
A better approach is to create entities that represent your events, and then have a cron job that checks once a day (or once a week) which entities are coming "due" in the next period and schedules tasks for them. This way you have only one day/week worth of scheduled tasks to deal with if you make changes in the code. It is also easy to delete these entities if a customer cancels an action or closes an account, for example.

Related

Google Appengine: How to schedule a task to run once at a time of the day

I am faced with a situation where I want a user to perform an action and have the option to revert the action before the next 24 hours, else the action would be executed. The only solution I have been able to come up with is to use a Cron Job and schedule it for a particular time of the day and in the Job I would check for all actions that their scheduled time has passed then execute them. But the action does not happen very often and thus having a Cron job running does not appear to be a good solution to me as I am not even sure of the cost implication.
What I want to do is that whenever a user clicks on the action a Job should be scheduled and once that action is executed the schedule should be cancelled. Is it possible to do this with Cron Job? If no what alternative does GAE provide?
When a request is scheduled by a cron job it's handled as one normal request, from the quota/billing perspective. I suppose your applications gets much more requests per day, so plus one request shouldn't be a matter, unless your application is quite heavyweight.
I'd prefer cron over deferred tasks, because the latter is more convoluted. A cron job most likely would query the datastore then do something or not. It's easier to keep track or manage the state of the datastore than keep track of deferred tasks.
You have two options:
(1) Run a cron-job once per hour (for example). Execute all actions that have more than 24 hours since time of their creation.
(2) When an action is stored, create a task using a DeferredTask API. Give this task a name (e.g. an ID of an action), in case you need to cancel it. Add this task to a queue with a delay of 24 hours. Java example:
Queue queue = QueueFactory.getDefaultQueue();
// Wait 24 hours to run
queue.add(TaskOptions.Builder.withPayload(new MyTask())
.name(taskName).etaMillis(System.currentTimeMillis() + (24 * 60 * 60 * 1000));

Google App Engine - Push Taskqueue - Limit to Countdown?

I am setting up push task queue on my Google App Engine App with a countdown parameter so it will execute at some point in the future.
However, my countdown parameter can be very large in seconds, for instance months or even a year in the future. Just want to make sure this will not cause any problems or overhead cost? Maybe there is a more efficient way to do this?
It probably would work, but it seems like a bad idea. What do you do if you change your task processing code? You can't modify a task in the queue. You'd somehow have to keep track of the tasks, delete the old ones and replace them with new ones that work with your updated code.
Instead, store information about the tasks in the data store. Run a cron job once a day or once a week, process the info in the data store, and launch the tasks as needed. You can still use a countdown if you need a precise execution date and time.
The current limit in Task Queues is 30 days, and we don't have plans to raise that substantially.
Writing scheduled operations to datastore and running a daily cron job to inject that day's tasks is a good strategy. That would allow you to update the semantics as your product evolves.

how do you deploy a cron script in production?

i would like to write a script that schedules various things throughout the day. unfortunately it will do > 100 different tasks a day, closer to 500 and could be up to 10,000 in the future.
All the tasks are independent in that you can think of my script as a service for end users who sign up and want me to schedule a task for them. so if 5 ppl sign up and person A wants me to send them an email at 9 am, this will be different than person B who might want me to query an api at 10:30 pm etc.
now, conceptually I plan to have a database that tells me what each persons task will be and what time they asked to schedule that task and the frequency. once a day I will get this data from my database so I have an up-to-date record of all the tasks that need to be executed in the day
running them through a loop I can create channels that can execute timers or tickers for each task.
the question I have is how does this get deployed in production to, for example google app engine? since those platforms are for Web servers I'm not sure how this would work...Or am I supposed to use Google Compute Engine and have it act as a computation for 24 hours? Can google compute engine even make http calls?
also if I have to have say 500 channels in go open 24 hrs a day, does that count as 500 containers in google app engine? I imagine that will get very costly quickly, despite what is essentially a very low cost product.
so again the question comes back to, how does a cron script get deployed in production?
any help or guidance will be greatly appreciated as I have done a lot of googling and unfortunately everything leads back to a cron scheduler that has a limit of 100 tasks in google app engine...
Details about cron operation on GAE can be found here.
The tricky portion from your prospective is that updating the cron configuration is done from outside the application, so it's at least difficult (if not impossible) to customize the cron jobs based on your app user's actions.
It is however possible to just run a generic cron job (once a minute, for example) and have that job's handler read the users' custom job configs and further generate tasks accordingly to handle them. Running ~10K tasks per day is usually not an issue, they might even fit inside the free app quotas (depending on what the tasks are actually doing).
The same technique can be applied on a regular Linux OS (including on a GCE VM). I didn't yet use GCE, so I can't tell exactly if/how would a dynamically updated cron be possible with it.
You only need one cron job for your requirements. This cron job can run every 30 minutes - or once per day. It will see what has to be done over the next period of time, create tasks to do it, and add these tasks to the queue.
It can all be done by a single App Engine instance. The number of instances you need to execute your tasks depends, of course, on how long each task runs. You have a lot of control over running the task queue.

GAE w/ Java, Scheduling User Notifications

I'm creating an app on GAE with Java, and looking for advice on how to handle scheduling user notifications (which will be email, text, push, whatever). There are a couple ways notifications will generated: when a producer creates content, and on a consumer's schedule. The later is the tricky part, because a consumer can change its schedule at any time. Here are the options I have considered and my concerns so far:
Keep an entry in the datastore for each consumer, indexed by the time until the next notification. My concern is over the lag for an eventually-consistent index. The longest lag I've seen reported is about 4 hours, which would be unacceptable for this use-case. A user should not delay their schedule by a week, then 4 hours later receive a notification from the old schedule.
The same as above, but with each entry sharing a common parent so that I can use an ancestor query to eliminate its eventual-ness. My concern is that there could be enough consumers to cause a problem with contention. In my wildest dreams I could foresee something like 10,000 schedule changes per minute at peak usage.
Schedule a task for each consumer. When changing the schedule, it could delete the old task and create a new one at the new time. My concern has to do with the interaction of tasks and datastore transactions, since the schedule will be stored in the datastore. The documentation notes that enqueing a task plays nicely with transactions, but what about deleting one? I would not want a task to be deleted only to have the add fail as part of its transaction.
Edit: I experimented with deleting tasks (for option 3), and unfortunately a delete that is part of a failed transaction still succeeds. That is a disappointing asymmetry. I will probably end up going that route anyway, but adding some extra logic and datastore flags to ensure rogue tasks that didn't get deleted properly simply do nothing when they execute.
Eventual consistency in the Datastore typically measures in seconds. As Google states:
the time delay is typically small, but may be longer (even minutes or
more in exceptional circumstances).
Save a time of next notification for each user. Run a cron job periodically (e.g. once per hour), and send notifications to all users who have to be notified at this time (i.e. now >= next notification).
Create a task for each user when a user's schedule is created with the countdown value. When a task executes, it creates the next task for this user.
The first approach is probably more efficient, especially if you choose a large enough window for your cron job.
As for transactions, I don't see why you need them. You can design your system that in the very rare fail situation a user will receive two notifications instead of one (old schedule and new schedule). This is not such a bad thing that you need to design around it.

How Google App Engine Java Task Queues can be used for mass scheduling for users?

I am focusing GAE-J for developing a Java web application.
I have a scenario where user will create his schedule for set of reminders. And I have to send emails on that particular date/time.
I can not create thread on GAE. So I have the solution of Task Queues.
So can I achieve this functionality with Task Queues. User will create tasks. And App Engine will execute it on specific date and time.
Thanks
Although using the task queue directly, as Chris suggests, will work, for longer reminder periods (eg, 30+ days) and in cases where the reminder might be modified, a more indirect approach is probably wise.
What I would recommend is storing reminders in the datastore, and then taking one of a few approaches, depending on your requirements:
Run a regular cron job (say, hourly) that fetches a list of reminders coming up in the next interval, and schedules task queue tasks for each.
Have a single task that you schedule to be run at the time the next reminder (system-wide) is due, which sends out the reminder(s) and then enqueues a new task for the next reminder that's due.
Run a backend, as Chris suggests, which regularly scans the datastore for upcoming reminders.
In all the above cases, you'll probably need some special case code for when a user sets a reminder in less than the minimum polling interval you've set - probably enqueuing a task directly. You'll also want to consider batching up the sending of reminders, to minimize tasks and wallclock time consumed.
You can do this with Task Queues - basically when you receive the request 'remind me at date/time X by sending an email', you create a new task with the following basic structure:
if current time is close to or past the given date/time X:
send the email
else
fail this task
If the reminder time is far in the future, the first few times the task is scheduled, it will fail and be scheduled for later. The downside of this approach is that it doesn't guarantee that the task will run exactly when the reminder is supposed to be sent - it may be a little while before or afterwards. You could slim down this window by taking into account that your task can run for 10 minutes, so if you're within 10 minutes of the reminder time, sleep until the right time and then send the e-mail.
If the reminders have to be sent out as close in time as possible then just use a Backend - keep an instance running forever and dispatch all reminders to it, and it can continuously look at all reminders it has to send out and send them out at exactly the right time.

Resources