Camel need Non-blocking Queue - Analogous to not processing events on graphics thread - apache-camel

Sorry I answered my own question - it actually IS just SEDA, I assumed when I saw 'BlockingQueue' that SEDA would block until the queue had been read ... which of course is nonsense. SEDA is completely all I need. Question answered
I've got a problem that's compeletely screwing me, I've been provided a custom Endpoint by company we connect to, but the endpoint maintains a heart-beat to a feed, and when it sends messages above a certain size they take so long to process on the route that its blocking and the heartbeat gets lost and the connection goes down
Obviously this is analogous to processing events on a non-graphics thread to keep a smooth operation going. But I'm unsure how I'd achieve this in camel. Essentially I want to queue the results and have them on a separate thread.
from( "custom:endpoint" )
.process( MyProcesor )
.to( "some-endpoint")

as suggested camel-seda is a simple way to perform async/mult-threaded processing, beware that the blocking queues are in-memory only (lost if VM is stopped, etc). if you need guaranteed messaging support, use camel-jms

Related

Aggregate results of batch consumer in Camel (for example from SQS)

I'm consuming messages from SQS FIFO queue with maxMessagesPerPoll=5 set.
Currently I'm processing each message individually which is a total waste of resources.
In my case, as we are using FIFO queue and all of those 5 messages are related to the same object, I could process them all toghether.
I though this might be done by using aggregate pattern but I wasn't able to get any results.
My consumer route looks like this:
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.process(exchange -> {
// process the message
})
I believe it should be possible to do something like this
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.aggregate(const(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.process(exchange -> {
// process ALL messages together as I now have a list of all exchanges
})
but the processor is never invoked.
Second thing:
If I'm able to make this work, when does ACK is sent to SQS? When each individual message is processed or when the aggregate process finishes? I hope the latter
When the processor is not called, the aggregator probably still waits for new messages to aggregate.
You could try to use completionSize(5) instead of completionFromBatchConsumer() for a test. If this works, the batch completion definition is the problem.
For the ACK against the broker: unfortunately no. I think the message is commited when it arrives at the aggregator.
The Camel aggregator component is a "stateful" component and therefore it must end the current transaction.
For this reason you can equip such components with persistent repositories to avoid data loss when the process is killed. In such a scenario the already aggregated messages would obviously be lost if you don't have a persistent repository attached.
The problem lies in GroupedExchangeAggregationStrategy
When I use this strategy, the output is an "array" of all exchanges. This means that the exchange that comes to the completion predicate no longer has the initial properties. Instead it has CamelGroupedExchange and CamelAggregatedSize which makes no use for the completionFromBatchConsumer()
As I don't actually need all exchanges being aggregated, it's enough to use GroupedBodyAggregationStrategy. Then exchange properties will remain as in the original exchange and just the body will contain an "array"
Another solution would be to use completionSize(Predicate predicate) and use a custom predicate that extracts necessary value from groupped exchanges.

How to mark a message as "in progress" so other workers don't work on it

I'm attempting to use a pull queue to create a queue of image processing tasks that could take longer that the acktimeout limit of 10 minutes. I'm using node.js api and I'm wondering how I could have a worker grab a message off the pull queue, mark it as in progress so no other workers attempt to grab it, do its work and acknowledge the message after the processing is done. This processing could take up to an hour per worker. If an exception occurs, I'd like to remove the "in progress" status and allow other workers to pick up this message and attempt to work on it.
I was hoping there was something in pubsub that would allow me to do this. My alternative is to, before processing, store an entity (inProgressMessage) with the message id, ack id, status=pending, timestamp=now() into datastore, have the worker immediately return the ackid after receiving the message (this will allow other workers to attempt other messages), then the worker can work on the lengthy task. If successful, mark the entity status as complete, if failed in a non permanent way, requeue the task into pubsub, if failed in a permanent way that won't allow reqeueing, I can have cron that checks datastore for pending tasks older than several hours and have them either be deleted or requeued.
My alternative feels like i'm re-implementing alot of what pub sub is supposed to help with.
Let me know if you can think of a better way.
To take longer than the ack deadline to process a message, you'll want to use modifyAckDeadline. You can extend the deadline as many times as you need up to 10 minutes per call. Your workflow would be as follows:
Pull the message.
Start to process the message.
While you are not done with the message, if you are close to the 10 minute ack deadline, call modifyAckDeadline to extend the deadline.
Once done processing the message, ack it.
Please note that calling modifyAckDeadline does not guarantee that the message won't be delivered to another task. In certain circumstances like server restarts, the message may end up being delivered to another of your subscribers. However, in most normal circumstances, as long as you call modifyAckDeadline before the current ack deadline, you can prevent a message's redelivered as long as necessary.
When creating a topic (only), you can configure the acknowledge time to be whatever up to 10 minutes (https://cloud.google.com/pubsub/subscriber). Once a message has been pulled from the queue, no other worker (of the same subscriber) will be able to take it for processing, unless the ack ttl was reached, and then the message is automatically returned to queue.
Since you need a longer period, you will have to implement something on your own, or seek another queuing solution. I think the design you suggested is fairly simple to implement, and is not really a re-implementation of what pubsub does.

x number of threads sending data to Server for displaying output on GUI

I have developed a single server/multiple client TCP Application.
The client consists of x number of threads each thread doing processing on its own data and then sending the data over TCP socket to the Server for displaying.
The Server is basically a GUI having a window. Server receves data from the client and displays it.
Now, the problem is that since there are 40 threads inside the client and each thread wants to send data, how can I achieve this using one connected socket?
My Suggestion:
My approach was to create a data structure inside each of the 40 threads in which data to be sent will be maintained. A separate Send Thread with one connected socket on client side is then created. This thread will read data from data structure of first thread, send it over the socket and then read the data from second thread and so on.
Confusions:
but I am not sure how would this be implemented as I am new to all this? :( What if a thread is writing to data structure and the Send Thread tries to read the data at the same time. I am familiar with mutex, critical section etc but that sounds too complex for my simple application.
Any other suggestions/comments other than my own suggestion are welcome.
If you think my own approach is correct then please help me solving my confusions that I mentioned above.
Thanks a lot in advance :)
Edit:
Can I put I timer on Send Thread and after a specific time the Send Thread suspends thread#1(so that it can access its data structure without any synchronization issues), reads data from its data structure, sends it over the tcp Socket, and resumes Thread#1 back, then it suspends Thread#2, reads data from its data structure, sends it over the tcp Socket, and resumes Thread#2 back and so on.
A common approach is to have one thread dedicated to sending the data. The other threads post their data into a shared container (list, deque, etc) and signal the sender thread that data is available. The sender then wakes up and processes whatever data is available.
EDIT:
The gist of it is as follows:
HANDLE data_available_event; // manual reset event; set when queue has data, clear when queue is empty
CRITICAL_SECTION cs; // protect access to data queue
std::deque<std::string> data_to_send;
WorkerThread()
{
while(do_work)
{
std::string data = generate_data()
EnterCriticalSection(&cs);
data_to_send.push_back(data);
SetEvent(data_available_event); // signal sender thread that data is available
LeaveCriticalSection(&cs);
}
}
SenderThread()
{
while(do_work)
{
WaitForSingleObject(data_available_event);
EnterCriticalSection(&cs);
std::string data = data_to_send.front();
data_to_send.pop_front();
if(data_to_send.empty())
{
ResetEvent(data_available_event); // queue is empty; reset event and wait until more data is available
}
LeaveCriticalSection(&cs);
send_data(data);
}
}
This is of course assuming the data can be sent in any order. I use strings only for illustrative purposes; you probably want some kind of custom object that knows how to serialize the data it holds.
Suspending thread#1 so you can access its data strcuture does not avoid synchronization issues. When you suspend it thread#1 could be in the midst of an update to the data, so the socket thread gets part of old data, part of new. That is data corruption.
You need a shared data structure such as a FIFO queue. The worker threads add to the queue, the socket thread removes the oldest item from the queue. All access to this shared queue must be protected with a critical section unless you implement a lock-free queue. (A circular buffer.)
Depending on your application needs, if you implement this queue you might not need the socket thread at all. Just do the dequeueing in the display thread.
There are a couple of ways to achieving it; Luke's idea suffers from race conditions that will still create data corruption
You avoid that by using UDP instead of TCP as the transport protocol. It'd be especially a good choice if you don't mind missing an occasional packet (which is okay for displaying rapidly changing data); it's fantastic for ensuring real-time updates on data where exact history doesn't matter (missing a point in a relatively smooth curve while plotting graphs is okay);
If the data packets are are small and sort of represent a stream then UDP is a great choice. Its benefit increases if you have multiple senders on different systems all displaying on a single screen.

Is there an elegant way to post messages to AWS SQS with visibility delay of longer than 15 minutes?

In Amazon Web Services, their queues allow you to post messages with a visibility delay up to 15 minutes. What if I don't want messages visible for 6 months?
I'm trying to come up with an elegant solution to the poll/push problem. I can write code to poll the SQS (or a database) every few seconds, check for messages that are ready to be visible, then move them to a "visible queue", or something like that. I wish there was a simpler, more reliable method to have messages become visible in queues far into the future without me having to worry about my polling application working perfectly all the time.
I'm not married to AWS, SQS or any of that, but I'd prefer to find a cloud-friendly solution that is stable, reliable and will trigger an event far into the future without me having to worry about checking on its status every day.
Any thoughts or alternate trees for me to explore barking up are welcome.
Thanks!
It sounds like you might be misunderstanding the visibility delay. Its purpose is to make sure that the polling application doesn't pull the same item off the queue more than once.
In other words, when the item is pulled off the queue it becomes invisible for a predetermined period of time (default is 30 seconds, max is 15 minutes) in case the polling system has a cluster of machines reading from the queue all at once.
Here's the relevant documentation:
http://docs.amazonwebservices.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/IntroductionArticle.html#AboutVT
...and the sentence in particular that relates to my comment is:
"Immediately after the component receives the message, the message is still in the queue. However, you don't want other components in the system receiving and processing the message again. Therefore, Amazon SQS blocks them with a visibility timeout, which is a period of time during which Amazon SQS prevents other consuming components from receiving and processing that message."
You should be able to use SQS for your purpose since you can leave an item in the queue for as long as you want.
7 years later, and Amazon still doesn't support the feature you need!
The two ways you can sort of get it to work are:
have messages contain a delivery target datetime in their message_attributes, and have the workers that consume the queue's messages just delete and recreate any message that is consumed before its target, with delay = max(0, min(secs_until_target_datetime, 900)) ; that would allow you to effectively schedule a message for any arbitrary future time;
or,
(slightly less frequent and costly:) similarly, if a message isn't due to be handled yet, recreate it and change its visibility timeout to be timeout = max(0, min(secs_until_target_datetime, 43200))
The disadvantage of using visibility timeout is that any read will re-trigger it.
There has been a direct AWS solution possible since 2016-12-01: AWS Step Functions
Each execution can last/idle up to one year, persists the state between transitions, and doesn't cost you any money while it waits.

Many-to-one gatekeeper task synchronization

I'm working on a design that uses a gatekeeper task to access a shared resource. The basic design I have right now is a single queue that the gatekeeper task is receiving from and multiple tasks putting requests into it.
This is a memory limited system, and I'm using FreeRTOS (Cortex M3 port).
The problem is as follows: To handle these requests asynchronously is fairly simple. The requesting task queues its request and goes about its business, polling, processing, or waiting for other events. To handle these requests synchronously, I need a mechanism for the requesting task to block on such that once the request has been handled, the gatekeeper can wake up the task that called that request.
The easiest design I can think of would be to include a semaphore in each request, but given the memory limitations and the rather large size of a semaphore in FreeRTOS, this isn't practical.
What I've come up with is using the task suspend and task resume feature to manually block the task, passing a handle to the gatekeeper with which it can resume the task when the request is completed. There are some issues with suspend/resume, though, and I'd really like to avoid them. A single resume call will wake up a task no matter how many times it has been suspended by other calls and this can create an undesired behavior.
Some simple pseudo-C to demonstrate the suspend/resume method.
void gatekeeper_blocking_request(void)
{
put_request_in_queue(request);
task_suspend(this_task);
}
void gatekeeper_request_complete_callback(request)
{
task_resume(request->task);
}
A workaround that I plan to use in the meantime is to use the asynchronous calls and implement the blocking entirely in each requesting task. The gatekeeper will execute a supplied callback when the operation completes, and that can then post to the task's main queue or a specific semaphore, or whatever is needed. Having the blocking calls for requests is essentially a convenience feature so each requesting task doesn't need to implement this.
Pseudo-C to demonstrate the task-specific blocking, but this needs to be implemented in each task.
void requesting_task(void)
{
while(1)
{
gatekeeper_async_request(callback);
pend_on_sempahore(sem);
}
}
void callback(request)
{
post_to_semaphore(sem);
}
Maybe the best solution is just to not implement blocking in the gatekeeper and API, and force each task to handle it. That will increase the complexity of each task's flow, though, and I was hoping I could avoid it. For the most part, all calls will want to block until the operation is finished.
Is there some construct that I'm missing, or even just a better term for this type of problem that I can google? I haven't come across anything like this in my searches.
Additional remarks - Two reasons for the gatekeeper task:
Large stack space required. Rather than adding this requirement to each task, the gatekeeper can have a single stack with all the memory required.
The resource is not always accessible in the CPU. It is synchronizing not only tasks in the CPU, but tasks outside the CPU as well.
Use a mutex and make the gatekeeper a subroutine instead of a task.
It's been six years since I posted this question, and I struggled with getting the synchronization working how I needed it to. There were some terrible abuses of OS constructs used. I've considered updating this code, even though it works, to be less abusive, and so I've looked at more elegant ways to handle this. FreeRTOS has also added a number of features in the last six years, one of which I believe provides a lightweight method to accomplish the same thing.
Direct-to-Task Notifications
Revisiting my original proposed method:
void gatekeeper_blocking_request(void)
{
put_request_in_queue(request);
task_suspend(this_task);
}
void gatekeeper_request_complete_callback(request)
{
task_resume(request->task);
}
The reason this method was avoided was because the FreeRTOS task suspend/resume calls do not keep count, so several suspend calls will be negated by a single resume call. At the time, the suspend/resume feature was being used by the application, and so this was a real possibility.
Beginning with FreeRTOS 8.2.0, Direct-to-task notifications essentially provide a lightweight built-into-the-task binary semaphore. When a notification is sent to a task, the notification value may be set. This notification will lie dormant until the notified task calls some variant of xTaskNotifyWait() or it will be woken if it had already made such a call.
The above code, can be slightly reworked to be the following:
void gatekeeper_blocking_request(void)
{
put_request_in_queue(request);
xTaskNotifyWait( ... );
}
void gatekeeper_request_complete_callback(request)
{
xTaskNotify( ... );
}
This is still not an ideal method, as if the task notifications are used elsewhere, you may run into the same problem with suspend/resume, where the task is woken by a different source than the one it is expecting. Given that, for me, it was a new feature, it may work out in the revised code.

Resources