How to avoid execution delays in Task Queues - google-app-engine

How can we get Push Queue Tasks scheduled for execution ASAP after enqueuing?
Do we need to resort to cron jobs with Pull Queues instead?
We periodically see very long delays (20 minutes) in executing Tasks waiting in our Push Queues. We'll see 6,000+ tasks in the Queue with none executing and non executed in the last minute.
Then the Tasks finally get scheduled to execute and we get a big burst spike as the queue is drained at a fast rate.
As an example, a queue definition looks like this:
<queue>
<name>example</name>
<target>1</target>
<rate>20/s</rate>
<bucket-size>40</bucket-size>
<max-concurrent-requests>10</max-concurrent-requests>
<retry-parameters>
<min-backoff-seconds>10</min-backoff-seconds>
<max-backoff-seconds>60</max-backoff-seconds>
<max-doublings>2</max-doublings>
</retry-parameters>
</queue>

Delays can occur with both pull queues and push queues. Task queues provide reliable task execution, never earlier than requested but sometimes later, for example when failing over from one data centre to another. The scheduling is best-effort, not real-time.

Related

Effective software scheduling

For example in a code like below
while(1){
task1();
task2();
}
there should be cooperation between task1() and task2() which are executed in rr fashion. However, if task1() is implemented as follows
task1(){
while(1);
}
Is there a way to build a scheduler which avoid monopolization of resources by task1() by only relying on software (for example switching tasks each 500 ms)?
Assume to have available only plain C/Assembly, and not rely on external scheduler/OS.
Is there a way to build a scheduler which avoid monopolization of resources by task1() by only relying on software (for example switching tasks each 500 ms)?
Yes it's possible; but it probably isn't possible in plain C because (at a minimum) you'd need to switch between different stacks during task switches.
However, you should know that just switching tasks every 500 ms is very "not effective". Specifically; when one task has to wait for anything (time delay, data received from network, user input, data to be fetched from disk, a mutex, ...) you want to keep the CPU busy by switching to a different task (if there are any other tasks).
To do that; you either need fully asynchronous interfaces for everything (which C does not have), or you need to control all of the code (e.g. write an OS).
Of course the majority of task switches are caused by "task has to wait for something" or "something task was waiting for occurred"; and switching tasks every 500 ms is relatively irrelevant (it only matters for rare tasks that don't do any IO), and even when it is relevant it's a bad idea (in a "10 half finished jobs vs. 5 finished jobs and 5 unstarted jobs" way).
one easy way is to use
a hardware timer,
a queue of tasks to run,
a scheduler
and a dispatcher.
a pre-allocated stack for each task
the timer interrupt handler triggers the scheduler. The scheduler determines which task is to run next. The scheduler triggers the dispatcher.
The dispatcher performs a 'context' switch between the prior running task and the next task to run, placing the prior running task back into the queue, then restores the 'context' of the next task to run, then
transfers execution control to the 'next' task,
The queue is composed of control blocks. The control blocks contain a copy of all the registers and the address of the entry point for the task

Why is GAE task on queue being run more often than specified?

I have a queue defined as follows:
....
<queue>
<name>sendMailsBatch</name>
<rate>1/m</rate>
<max-concurrent-requests>1</max-concurrent-requests>
<retry-parameters>
<min-backoff-seconds>70</min-backoff-seconds>
<max-doublings>1</max-doublings>
</retry-parameters>
</queue>
....
I want there to be time gap of at least 60 seconds between each time a task runs. This must be the case no matter whether it is the same task being run because it fails, or whether it is different tasks being run.
The process starts by one task being put onto the queue, and this task will at the end - if all datastore operations are successfull - add another task to the queue (as it uses a cursor from the datastore operation executed by the task).
When I look at the log though, the tasks are executed too often:
Why are the tasks executed this frequently, when I have configured that at most one task can run at a time, and at the most one task per minute, and if a task fails, there should be at least 70s between the runs?
Thanks,
-Louise
When processing the queue, the app engine uses all of the concurrent requests specified to process what is already in its bucket. Once it finishes those tasks, it won't perform any additional work until a new task appears on the bucket. The rate at which these tasks are added to the bucket is defined by <rate>.
In your case, you set the <rate> correctly but since you didn't explicitly set the <bucket-size> parameter, it defaulted to 5 as mentioned here: https://cloud.google.com/appengine/docs/standard/java/config/queueref. Once you explicitly set the <bucket-size> to 1, you should no longer run into this issue.

Task queue of size 1 for serial processing

I want a serial queue where only one task may process at a time. This is my declaration in queue.xml:
<queue>
<name>pumpkin</name>
<rate>20/s</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
Does the "rate" parameter have any effect in this setup?
I want tasks to queue up and only process one at a time.
Thanks
The queue rate and bucket size do not limit the number of tasks processes simultaneously. See this answer which captured a (now gone) official explanation of these configs better than the (current) official docs (you may want to revisit your configs accordingly): https://stackoverflow.com/a/3740846/4495081
However your max-concurrent-requests should ensure only one task is executed at a time, according to the docs:
You can avoid this possibility by setting max_concurrent_requests to a
lower value. For example, if you set max_concurrent_requests to 10,
our example queue maintains about 20 tasks/second when latency is 0.3
seconds. However, when the latency increases over 0.5 seconds, this
setting throttles the processing rate to ensure that no more than 10
tasks run simultaneously.

Need suggestion for handling large number of timers/timeouts

I am working on redesign an existing L2TP(Layer 2 tunneling protocol) code.
For L2TP , the number of tunnels we support is 96K. L2TP protocol has a keep-alive mechanism where it needs to send HELLO msges.
Say if we have 96,000 tunnels for which L2TPd needs to send HELLO msg after configured timeout value , what is the best way to implement it ?
Right now , we have a timer thread , where for every 1sec , we iterate and send HELLO msges. This design is a old design which is not scaling now.
Please suggest me a design to handle large number of timers.
There are a couple of ways to implement timers:
1) select: this system call allows you to wait on a file descriptor, and then wake up. You can wait on a file descriptor that does nothing as a timeout
2) Posix Condition Variables: similar to select, they have a time out mechanism built in.
3) If you are using UNIX, you can set a UNIX signal to wake up.
Those are basic ideas. You can see how well they scale to multiple timers; I would guess you'd have to have multiple condvars/selects for some handful of the threads.
Dependingo on the behaviour you want, you would probably want a thread for every 100 timers or so, and use one of the mechanisms above to wake up
one of the timers. You'd have a thread sitting in a loop, and keeping
track on each of the 100 timeouts, then waking up.
Once you exceed 100 timers, you would simply create a new thread and have it manage the next 100 timers and so on.
I don't know if 100 is the right granularity, but it's something you'd play with.
Hopefully that's what you are looking for.
Typically, such requirements are met with a delta-queue. When a timeout is required, get the system tick count and add the timeout interval to it. This gives the Timeout Expiry Tick Count, (TETC). Insert the socket object into a queue that is sorted by decreasing TETC and have the thread wait for the TETC of the item at the head of the queue.
Typically, with asocket timeouts, queue insertion is cheap because there are many timeouts with the same interval and so new timeout insertion will normally take place at the queue tail.
Management of the queue, (actually, since insertion into the sorted queue could take place anywhere, it's more like a list than a queue, but whatever:), is best kept to one timeout thread that is normally performing a timed wait on a condvar or semaphore for the lowest TETC. New timeout-objects can then be queued to the thread on a thread-safe concurrent queue and signaled to the timeout-handler thread by the sema/condvar.
When the timeout thread becomes ready on TETC timeout, it could call some 'OnTimeout' method of the object itself, or it might put the timed-out object onto a threadpool input queue.
Such a delta-queue is much more efficient for handling large numbers of timeouts than any polling scheme, especially for requirements with longish intervals. No polling is required, no CPU/memory bandwidth wasted on continual iterations and the typical latency is going to be a system clock-tick or two.
It is dependent on the processor/OS, kernel version, architecture.
In linux, one of the option is to use its timer functionality for multiple timers. Addition of timer can be done using add_timer in linux. You can define it using timer_list and initilialize internal values of timer using init_timer.
Followed by it register it using add_timer after filling timer_list(timeout(expire), function to execute after timeout(function), parameter to the function(data)) appropriately for respective timer. If jiffies is more than or equal to timeout(expire), then the respective timer handler(function) shall be triggered.
Some processors have provisioning for timer wheels(that consists of a number of queues that are placed equally in time in slots) which can be configured for a wide range of timers,timeouts as per the requirement.

app engine task queue wait limit

How long can a task sit in the task queue waiting to be processed before something happens? If its not forever, what are those somethings that might happen?
Can I add a very large number of tasks to a queue that has a very low processing rate and have them be processed over the course of days/weeks/months?
Are tasks ejected from the queue if they're waiting for their turn too long?
Task Queue Quota and Limits says
maximum countdown/ETA for a task:30 days from the current date and time
I think that's talking about intentionally/programatically setting an eta in the future, not how long a task is allowed to wait for its turn.
There's no limit on how many tasks you can have in your queue, other than the amount of storage you have allocated to storing tasks. There's likewise no limit how long they can wait to execute, though as you point out, you can't schedule a task with an ETA more than 30 days in the future.
As far as I know they last forever. I have had some in their for days. Right now I have some that are 9 days old, although the queue is paused. The only limit is the queue size and count (which are not currently enforced).

Resources