Task Queue completion callback - google-app-engine

I am using google cloud task queue to do some long running tasks.
Once all the task have been completed i wanted to send some notification.
I am using below code to get number of pending task in my thread
QueueStatistics stats= taskQueue.fetchStatistics();
stats.getNumTasks();
but here i am continuously checking value return by getNumTask() method.
If it is zero then i notify others.
Is there any callback available which could notify me once all the task of my queue have been completed.
Regards,

If concurrently running tasks is not a must for your application, you can setup the queue with max-concurrent-requests set to 1, so that tasks will run one by one:
<queue-entries>
<queue>
<name>my-queue</name>
<rate>1/s</rate>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
</queue-entries>
Then, after push all tasks in the queue, you push a notification task to the same queue. The notification task will be the last one on the queue and will be executed after all tasks are completed.
Notes: be careful with auto retries when one of your tasks fails. This will make the notification task not the last one on queue. Maybe you can purge the queue on failure then retry.

Related

how to perform parallel processing of gcp pubsub messages in apache camel

I have this code below that takes message from pubsub source topic -> transform it as per a template -> then publish the transformed message to a target topic.
But to improve performance I need to do this task in parallel.That is i need to poll 500 messages,and then transform it in parallel and then publish them to the target topic.
From the camel gcp component documentation I believe maxMessagesPerPoll and concurrentConsumers parameter will do the job.Due to lack of documentation I am not sure how does it internally works.
I mean a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic b)what about ordering of the messages c) should I be looking at parallel processing EIPs as an alternative
etc.
The concept is not clear to me
Was go
// my route
private void addRouteToContext(final PubSub pubSub) throws Exception {
this.camelContext.addRoutes(new RouteBuilder() {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("google-pubsub:{{gcp_project_id}}:{{pubsub.dead.letter.topic}}")
.useOriginalMessage().onPrepareFailure(new FailureProcessor()));
/*
* from topic
*/
from("google-pubsub:{{gcp_project_id}}:" + pubSub.getFromSubscription() + "?"
+ "maxMessagesPerPoll={{consumer.maxMessagesPerPoll}}&"
+ "concurrentConsumers={{consumer.concurrentConsumers}}").
/*
* transform using the velocity
*/
to("velocity:" + pubSub.getToTemplate() + "?contentCache=true").
/*
* attach header to the transform message
*/
setHeader("Header ", simple("${date:now:yyyyMMdd}")).routeId(pubSub.getRouteId()).
/*
* log the transformed event
*/
log("${body}").
/*
* publish the transformed event to the target topic
*/
to("google-pubsub:{{gcp_project_id}}:" + pubSub.getToTopic());
}
});
}
a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic
No, Camel does not create 500 parallel threads in this case. As you suspect, the number of concurrent consumer threads is set with concurrentConsumers. So if you define 5 concurrentConsumers with a maxMessagesPerPoll of 500, every consumer will fetch up to 500 messages and process them one after the other in a single thread. That is, you have 5 messages processed in parallel.
what about ordering of the messages
As soon as you process messages in parallel, the order of messages is messed up. But this already happens with 1 Consumer when you got processing errors and they are detoured to your deadLetterChannel and reprocessed later.
should I be looking at parallel processing EIPs as an alternative
Only if the concurrentConsumers option is not sufficient.
When you mention the concurrentConsumers option(let's say concurrentConsumers=10), you are asking Camel to create a thread pool of 10 threads, and each of those 10 threads will pick up a different message from the pub-sub queue and process them.
The thing to note here is that when you are specifying the concurrentConsumers option, the thread pool uses a fixed size, which means that a fixed number of active threads are waiting at all times to process incoming messages. So 10 threads(since I specified concurrentConsumers=10) will be waiting to process my messages, even if there aren't 10 messages coming in simultaneously.
Obviously, this is not going to guarantee that the incoming messages will be processed in the same order. If you are looking to have the messages in the same order, you can have a look at the Resequencer EIP to order your messages.
As for your third question, I don't think google-pubsub component allows a parallel processing option. You can make your own using the Threads EIP. This would definitely give more control over your concurrency.
Using Threads, your code would look something like this:
from("google-pubsub:project-id:destinationName?maxMessagesPerPoll=20")
// the 2 parameters are 'pool size' and 'max pool size'
.threads(5, 20)
.to("direct:out");

blocking Inter task communication in RTOS

I'm writing a module which contains a task with the highest priority and it should be in blocking until it receives a message from an other task the start doing its duty as a highest priority task. It uses mailbox mechanism for signaling .
My problem is
I want the task -which send a signal to higher task- gets back message in blocking mode
Here is my question
should I post through mailbox 1 and then fetch from mailbox 2 or there is a better solution?
I use "FreeRTOS" if it helps
EDIT
I think I described the problem very bad
I mean do I need 2 mailbox in order to communicate between task to task or ISR to task or I can use just one mailbox with other implementation!!??
For your edited question:
You have to use two message queues. One for each task otherwise you won't be able to wait correctly.
So for your blocking message transfer, the code looks like this:
High priority task:
while(-1){
xQueueReceive(high_prio_queue, &msg, portMAX_DELAY);
[your complex code]
xQueueSend(low_prio_queue, &return_msg, timeout);
}
Low priority task:
xQueueSend(high_prio_queue, &msg, timeout);
//will only wait if your high priority task gets blocked before sending
xQueueReceive(low_prio_queue, &return_msg, portMAX_DELAY);
From ISR:
xQueueSendFromISR(high_prio_queue, &msg, &unblocked);
It is very simple. For example queues used and the freeRTOS.
The task waits for the queue. It is in the blocked state
while(1)
{
xQueueReceive(queue, &object, portMAX_DELAY);
....
another task send the data to the queue.
xQueueSend(queue, &object, timeout);
When the data is received the task is given the control. Then it checks if anything is in the queue. If not it waits in blocked state.

Job Queue using Google PubSub

I want to have a simple task queue. There will be multiple consumers running on different machines, but I only want each task to be consumed once.
If I have multiple subscribers taking messages from a topic using the same subscription ID is there a chance that the message will be read twice?
I've tested something along these lines successfully but I'm concerned that there could be synchronization issues.
client = SubscriberClient.create(SubscriberSettings.defaultBuilder().build());
subName = SubscriptionName.create(projectId, "Queue");
client.createSubscription(subName, topicName, PushConfig.getDefaultInstance(), 0);
Thread subscriber = new Thread() {
public void run() {
while (!interrupted()) {
PullResponse response = subscriberClient.pull(subscriptionName, false, 1);
List<ReceivedMessage> messages = response.getReceivedMessagesList();
mess = messasges.get(0);
client.acknowledge(subscriptionName, ImmutableList.of(mess.getAckId()));
doSomethingWith(mess.getMessage().getData().toStringUtf8());
}
}
};
subscriber.start();
In short, yes there is a chance that some messages will be duplicated: GCP promises at-least-once delivery. Exactly-once-delivery is theoretically impossible in any distributed system. You should design your doSomethingWith code to be idempotent if possible so duplicate messages are not a problem.
You should also only acknowledge a message once you have finished processing it: what would happen if your machine dies after acknowledge but before doSomethingWith returns? your message will be lost! (this fundamental idea is why exactly-once delivery is impossible).
If losing messages is preferable to double processing them, you could add a locking process (write a "processed" token to a consistent database), but this can fail if the write is handled before the message is processed. But at this point you might be able to find a messaging technology that is designed for at-most-once, rather than optimised for reliability.

Design of multi-threaded server in c

When trying to implement a simple echo server with concurrent support on linux.
Following approaches are used:
Use pthread functions to create a pool of thread, and maintained in a linked list. It's created on process start, and destroy on process termination.
Main thread will accept request, and use a POSIX message queue to store accepted socket file descriptor.
Threads in pool loop to read from message queue, and handle request it gets, when there is no request, it will block.
The program seems working now.
The questions are:
Is it suitable to use message queue in the middle, is it efficient enough?
What is the general approach to accomplish a thread tool that needs to handle concurrent request from multiple clients?
If it's not proper to make threads in pool loop & block to retrieve msg from message queue, then how to deliver requests to threads?
This seems unneccesarily complicated to me. The usual approach for a multithreaded server is:
Create a listen-socket in a thread process
Accept the client-connections in a thread
For each accepted client connection, create a new threads, which receives the corresponding file descriptor and does the work
The worker thread closes the client connection, when it is fully handled
I do not see much benefit in prepopulating a thread-pool here.
If you really want a threadpool:
I would just use a linked list for accepted connections and a pthread_mutex to synchronize access to it:
The listener-process enqueues client fds at the tail of the list.
The clients dequeue it at the head.
If the queue is empty, the thread can wait on a variable (pthread_cond_wait) and are notified by the listener process (pthread_cond_signal) when connections are available.
Another alternative
Depending on the complexity of handling requests, it might be an option to make the server single-threaded, i.e. handle all connections in one thread. This eliminates context-switches altogether and can thus be very performant.
One drawback is, that only one CPU-core is used. To improve that, a hybrid-model can be used:
Create one worker-thread per core.
Each thread handles simultaneously n connections.
You would however have to implement mechanisms to distribute the work fairly amongst the workers.
In addition to using pthread_mutex, you will want to use pthread_cond_t (pthread condition), this will allow you to put the threads in the thread pool to sleep while they are not actually doing work. Otherwise, you will be wasting compute cycles if they are sitting there in a loop checking for something in the work queue.
I would definitely consider using C++ instead of just pure C. The reason I suggest it is that in C++ you are able to use templates. Using a pure virtual base class (lets call it: "vtask"), you can create templated derived classes that accept arguments and insert the arguments when the overloaded operator() is called, allowing for much, much more functionality in your tasks:
//============================================================================//
void* thread_pool::execute_thread()
{
vtask* task = NULL;
while(true)
{
//--------------------------------------------------------------------//
// Try to pick a task
m_task_lock.lock();
//--------------------------------------------------------------------//
// We need to put condition.wait() in a loop for two reasons:
// 1. There can be spurious wake-ups (due to signal/ENITR)
// 2. When mutex is released for waiting, another thread can be waken up
// from a signal/broadcast and that thread can mess up the condition.
// So when the current thread wakes up the condition may no longer be
// actually true!
while ((m_pool_state != state::STOPPED) && (m_main_tasks.empty()))
{
// Wait until there is a task in the queue
// Unlock mutex while wait, then lock it back when signaled
m_task_cond.wait(m_task_lock.base_mutex_ptr());
}
// If the thread was waked to notify process shutdown, return from here
if (m_pool_state == state::STOPPED)
{
//m_has_exited.
m_task_lock.unlock();
//----------------------------------------------------------------//
if(mad::details::allocator_list_tl::get_allocator_list_if_exists() &&
tids.find(CORETHREADSELF()) != tids.end())
mad::details::allocator_list_tl::get_allocator_list()
->Destroy(tids.find(CORETHREADSELF())->second, 1);
//----------------------------------------------------------------//
CORETHREADEXIT(NULL);
}
task = m_main_tasks.front();
m_main_tasks.pop_front();
//--------------------------------------------------------------------//
//run(task);
// Unlock
m_task_lock.unlock();
//--------------------------------------------------------------------//
// execute the task
run(task);
m_task_count -= 1;
m_join_lock.lock();
m_join_cond.signal();
m_join_lock.unlock();
//--------------------------------------------------------------------//
}
return NULL;
}
//============================================================================//
int thread_pool::add_task(vtask* task)
{
#ifndef ENABLE_THREADING
run(task);
return 0;
#endif
if(!is_alive_flag)
{
run(task);
return 0;
}
// do outside of lock because is thread-safe and needs to be updated as
// soon as possible
m_task_count += 1;
m_task_lock.lock();
// if the thread pool hasn't been initialize, initialize it
if(m_pool_state == state::NONINIT)
initialize_threadpool();
// TODO: put a limit on how many tasks can be added at most
m_main_tasks.push_back(task);
// wake up one thread that is waiting for a task to be available
m_task_cond.signal();
m_task_lock.unlock();
return 0;
}
//============================================================================//
void thread_pool::run(vtask*& task)
{
(*task)();
if(task->force_delete())
{
delete task;
task = 0;
} else {
if(task->get() && !task->is_stored_elsewhere())
save_task(task);
else if(!task->is_stored_elsewhere())
{
delete task;
task = 0;
}
}
}
In the above, each created thread runs execute_thread() until the m_pool_state is set to state::STOPPED. You lock the m_task_lock, and if the state is not STOPPED and the list is empty, you pass the m_task_lock to your condition, which puts the thread to sleep and frees the lock. You create the tasks (not shown), add the task (m_task_count is an atomic, by the way, that is why it is thread safe). During the add task, the condition is signaled to wake up a thread, from which the thread proceeds from the m_task_cond.wait(m_task_lock.base_mutex_ptr()) section of execute_thread() after m_task_lock has been acquired and locked.
NOTE: this is a highly customized implementation that wraps most of the pthread functions/objects into C++ classes so copy-and-pasting will not work whatsoever... Sorry. And w.r.t. the thread_pool::run(), unless you are worrying about return values, the (*task)() line is all you need.
I hope this helps.
EDIT: the m_join_* references is for checking whether all the tasks have been completed. The main thread sits in a similar conditioned wait that checks whether all the tasks have been completed as this is necessary for the applications I use this implementation in before proceeding.

how to manually set a task to run in a gae queue for the second time

I have a task that runs in GAE queue.
according to my logic, I want to determine if the task will run again or not.
I don't want it do be normally executed by the queue and then to put it again in the queue
because I want to have the ability to check the "X-AppEngine-TaskRetryCount"
and quit trying after several attempts.
To my understanding it seems that the only case that a task will re-executed is when an internal GAE error will happen (or If my code will take too long in a "DeadlineExceededException" cases..(And I don't want to hold the code "hostage" for that long :) )
How can I re-enter a task to the queue in a manner that GAE will set X-AppEngine-TaskRetryCount ++ ??
You can programmatically retry / restart a task using a self.error() in python.
From the docs: App engine retries a task by returning any HTTP status code outside of the range 200–299
And at the beginning of the task you can test for the number of retries using:
retries = int(self.request.headers['X-Appengine-Taskretrycount'])
if retries < 10 :
self.error(409)
return

Resources