How Google App Engine Java Task Queues can be used for mass scheduling for users? - google-app-engine

I am focusing GAE-J for developing a Java web application.
I have a scenario where user will create his schedule for set of reminders. And I have to send emails on that particular date/time.
I can not create thread on GAE. So I have the solution of Task Queues.
So can I achieve this functionality with Task Queues. User will create tasks. And App Engine will execute it on specific date and time.
Thanks

Although using the task queue directly, as Chris suggests, will work, for longer reminder periods (eg, 30+ days) and in cases where the reminder might be modified, a more indirect approach is probably wise.
What I would recommend is storing reminders in the datastore, and then taking one of a few approaches, depending on your requirements:
Run a regular cron job (say, hourly) that fetches a list of reminders coming up in the next interval, and schedules task queue tasks for each.
Have a single task that you schedule to be run at the time the next reminder (system-wide) is due, which sends out the reminder(s) and then enqueues a new task for the next reminder that's due.
Run a backend, as Chris suggests, which regularly scans the datastore for upcoming reminders.
In all the above cases, you'll probably need some special case code for when a user sets a reminder in less than the minimum polling interval you've set - probably enqueuing a task directly. You'll also want to consider batching up the sending of reminders, to minimize tasks and wallclock time consumed.

You can do this with Task Queues - basically when you receive the request 'remind me at date/time X by sending an email', you create a new task with the following basic structure:
if current time is close to or past the given date/time X:
send the email
else
fail this task
If the reminder time is far in the future, the first few times the task is scheduled, it will fail and be scheduled for later. The downside of this approach is that it doesn't guarantee that the task will run exactly when the reminder is supposed to be sent - it may be a little while before or afterwards. You could slim down this window by taking into account that your task can run for 10 minutes, so if you're within 10 minutes of the reminder time, sleep until the right time and then send the e-mail.
If the reminders have to be sent out as close in time as possible then just use a Backend - keep an instance running forever and dispatch all reminders to it, and it can continuously look at all reminders it has to send out and send them out at exactly the right time.

Related

Scheduling cron jobs

I want to develop an app on which a user can register for alerts( multiple) so that whenever the fare hits below some threshold, he gets a notification. Fares are fetched from a third party website.I want to do this on google app-engine.
Now from what i understand , i need a process running 24/7 which checks the fares at say intervals of 30 mins and send out a notification whenever it hits below the threshold. Probably the cron job of app-engine can be used for this task ? But at max 100 cron jobs can be scheduled, what would be the better way to this. Also having a process for each user would be wastage of resources, what would be better scheduling algorithms for higher efficiency ?
You want to schedule a single cron that runs every 30 minutes and throws an item onto a task queue. That single item on the task queue would then be able to go through all your users, and generate tasks to fetch whatever you need in the background again. Two important things:
You want the initial cron call to return as quickly as possible, as URLs have a 60 second deadline.
Split up any work into separate task queues to achieve above and also iterate through data sources and/or users.
Based on what you're explaining, you can use push task queues: https://cloud.google.com/appengine/docs/python/taskqueue/overview-push

GAE w/ Java, Scheduling User Notifications

I'm creating an app on GAE with Java, and looking for advice on how to handle scheduling user notifications (which will be email, text, push, whatever). There are a couple ways notifications will generated: when a producer creates content, and on a consumer's schedule. The later is the tricky part, because a consumer can change its schedule at any time. Here are the options I have considered and my concerns so far:
Keep an entry in the datastore for each consumer, indexed by the time until the next notification. My concern is over the lag for an eventually-consistent index. The longest lag I've seen reported is about 4 hours, which would be unacceptable for this use-case. A user should not delay their schedule by a week, then 4 hours later receive a notification from the old schedule.
The same as above, but with each entry sharing a common parent so that I can use an ancestor query to eliminate its eventual-ness. My concern is that there could be enough consumers to cause a problem with contention. In my wildest dreams I could foresee something like 10,000 schedule changes per minute at peak usage.
Schedule a task for each consumer. When changing the schedule, it could delete the old task and create a new one at the new time. My concern has to do with the interaction of tasks and datastore transactions, since the schedule will be stored in the datastore. The documentation notes that enqueing a task plays nicely with transactions, but what about deleting one? I would not want a task to be deleted only to have the add fail as part of its transaction.
Edit: I experimented with deleting tasks (for option 3), and unfortunately a delete that is part of a failed transaction still succeeds. That is a disappointing asymmetry. I will probably end up going that route anyway, but adding some extra logic and datastore flags to ensure rogue tasks that didn't get deleted properly simply do nothing when they execute.
Eventual consistency in the Datastore typically measures in seconds. As Google states:
the time delay is typically small, but may be longer (even minutes or
more in exceptional circumstances).
Save a time of next notification for each user. Run a cron job periodically (e.g. once per hour), and send notifications to all users who have to be notified at this time (i.e. now >= next notification).
Create a task for each user when a user's schedule is created with the countdown value. When a task executes, it creates the next task for this user.
The first approach is probably more efficient, especially if you choose a large enough window for your cron job.
As for transactions, I don't see why you need them. You can design your system that in the very rare fail situation a user will receive two notifications instead of one (old schedule and new schedule). This is not such a bad thing that you need to design around it.

which queue should I use (App Engine)

I have app engine application. I have users refresh tokens (In order to have access to google drive) in my database.
Now, I want to create this:
Every week (I mean every 7th day), I want to temorary download users PDF documents from google drive and work with them. I send mails to each user about their pdf documents.
The main problem is that , there might be many users. and each user may have a lot of document too. I should do that work for each users, once a week. But each users data needs much time too.
QUESTION:
So, now I think, which time service should I use? Cron or Task Queue. and why? and if Task Queue, which one. which will be faster, and flexible? I can send mail to the user later too (it's not necessary to send mails immediately when she/he requests)
QUESTION2: can I run Task Queue , for instance, once a week?
For example If I want to run it every day I can use something like that:
<rate>1/d</rate>
but how can I do that once a week?
QUESTION3: because of there might be many user (and because of each user needs much time), can I use something like that?
CRON job will be weekly (once a week). And CRON to call TASK QUEUSE, for each user. Each user data will be downloaded in the app engine server temporary (I think , If I save it in memory, it will be very hard for server). Then I will see PDF documents and send mails to each user. Is this good way? or should I use only CRON? did I have limitation here? On server storage or at queues or something like that.
Use both. Create a cron job to run every 7 days. Have the cron job fire a task (in the push queue) to process your PDFs. I'd use a separate task for each PDF to process, and configure your queue.yaml so that it processes them at the correct rate (depending on budget / rate limiting factors etc).
If you need to send mail, you can do this from the task request, via the mail api.
As a side note, if you have many users, a better approach may be to have the cron job run more frequently than every 7 days (say, once a day, or even more). You can use logic to determine which users need to be processed each time the cron runs. This may even the load, and ultimately save you money.

GAE - Execute many small tasks after a fixed time

I'd like to make a Google App Engine app that sends a Facebook message to a user a fixed time (e.g. one day) after they click a button in the app. It's not scalable to use cron or the task queue for potentially millions of tiny jobs. I've also considered implementing my own queue using a background thread, but that's only available using the Backends API as far as I know, which is designed for much larger usage and is not free.
Is there a scalable way for a free Google App Engine app to execute a large number of small tasks after a fixed period of time?
For starters, if you're looking to do millions of tiny jobs, you're going to blow past the free quota very quickly, any way you look at it. The free quota's meant for testing.
It depends on the granularity of your tasks. If you're executing a lot of tasks once per day, cron hooked up to a mapreduce operation (which essentially sends out a bunch of tasks on task queues) works fine. You'll basically issue a datastore query to find the tasks that need to be run, and send them out on the mapreduce.
If you execute this task thousands of times a day (every minute), it may start getting expensive because you're issuing many queries. Note that if most of those queries return nothing, the cost is still minimal.
The other option is to store your tasks in memory rather than in the datastore, that's where you'd want to start using backends. But backends are expensive to maintain. Look into using Google Compute Engine, which gives much cheaper VMs.
EDIT:
If you go the cron/datastore route, you'd store a new entity whenever a user wants to send a deferred message. Most importantly, it'd have a queryable timestamp for when the message should be sent, probably rounded to the nearest minute or the nearest 5 minutes, whatever you decide your granularity should be.
You would then have a cron job that runs at the set interval, say every minute. On each run it would build a query for all the cron jobs it needs to send for the given minute.
If you really do have hundreds of thousands of messages to send each minute, you're not going to want to do it from the cron task. You'd want the cron task to spawn a mapreduce job that will fan out the query and spawn tasks to send your messages.

How to create X tasks as fast as possible on Google App Engine

We push out alerts from GAE, and let's say we need to push out 50 000 alerts to CD2M (Cloud 2 Device Messaging). For this we:
Read all who wants alerts from the datastore
Loop through and create a "push task" for each notification
The problem is that the creation of the task takes some time so this doesn't scale when the user base grows. In my experience we are getting 20-30 seconds just creating the tasks when there is a lot of them. The reason for one task pr. push message is so that we can retry the task if something fails and it will only affect a single subscriber. Also C2DM only supports sending to one user at a time.
Will it be faster if we:
Read all who wants alerts from the datastore
Loop through and create a "pool task" for each 100 subscribers
Each "Pool task" will generate 100 "push tasks" when they execute
The task execution is very fast so in our scenario it seems like the creation of the tasks is the bottleneck and not the execution of the tasks. That's why I thought about this scenario to be able to increase the parallelism of the application. I would guess this would lead to faster execution but then again I may be all wrong :-)
We do something similar with APNS (Apple Push Notification Server): we create a task for a batch of notifications at a time (= pool task as you call it). When task executes, we iterate over a batch and send it to push server.
The difference with your setup is that we have a separate server for communicating with push, as APNS only supports socket communication.
The only downside is if there is an error, then whole task will be repeated and some users might get two notifications.
This sounds like it varies based on the number of alerts you need to send out, how long it takes to send each alert, and the number of active instances you have running.
My guess is that it takes a few milliseconds to tens of milliseconds to send out a CD2M alert, while it takes a few seconds for an instance to spin up, so you can probably issue a few hundred or a few thousand alerts before justifying another task instance. The ratio of the amount of time it takes to send each CD2M message vs the time it takes to launch an instance will dictate how many messages you'd want to send per task.
If you already have a fair number of instances running though, you don't have the delay of waiting for instances to spin up.
BTW, this seems almost like a perfect application of the MapReduce API. It mostly does what you describe in the second version, except it takes your initial query, and breaks that up into subqueries that each return a "page" of the result set. A task is launched for each subquery which processes all the items in its "page". This is an improvement from what you describe, because you don't need to spend the time looping through your initial result set.
I believe the default implementation for the MapReduce API just queries for all entities of a particular kind (ie all User objects), but you can change the filter used.

Resources