How to make Camel AWSSQS Consumer component synchronous - apache-camel

I am using 2.17.6 version of camel AWSSQS component ( consumer) to consume messages from a FIFO queue.
When the exchange processing takes longer than visibility timeout, camel is spinning up a new thread with the same message ( but a new exchange). This is happening because I am using long polling, and looks like this component is built to have asynchronous behavior by default.
In my case, I want to move on to the next message only when the current message gets deleted after processing. ( as my requirement is FIFO)
If I increase the visibility timeout to a high value(ex: 15 mins), that solves the issue. but I do have cases where I will go for infinite retry, and in those cases I don't want to cross the set high Visibility timeout and have multiple threads starting thus changing the order of execution.
Please suggest, if there is a way not to receive the next message in the queue until I am done processing the current one.
Thanks,
Sowjanya.

Related

Aggregate results of batch consumer in Camel (for example from SQS)

I'm consuming messages from SQS FIFO queue with maxMessagesPerPoll=5 set.
Currently I'm processing each message individually which is a total waste of resources.
In my case, as we are using FIFO queue and all of those 5 messages are related to the same object, I could process them all toghether.
I though this might be done by using aggregate pattern but I wasn't able to get any results.
My consumer route looks like this:
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.process(exchange -> {
// process the message
})
I believe it should be possible to do something like this
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.aggregate(const(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.process(exchange -> {
// process ALL messages together as I now have a list of all exchanges
})
but the processor is never invoked.
Second thing:
If I'm able to make this work, when does ACK is sent to SQS? When each individual message is processed or when the aggregate process finishes? I hope the latter
When the processor is not called, the aggregator probably still waits for new messages to aggregate.
You could try to use completionSize(5) instead of completionFromBatchConsumer() for a test. If this works, the batch completion definition is the problem.
For the ACK against the broker: unfortunately no. I think the message is commited when it arrives at the aggregator.
The Camel aggregator component is a "stateful" component and therefore it must end the current transaction.
For this reason you can equip such components with persistent repositories to avoid data loss when the process is killed. In such a scenario the already aggregated messages would obviously be lost if you don't have a persistent repository attached.
The problem lies in GroupedExchangeAggregationStrategy
When I use this strategy, the output is an "array" of all exchanges. This means that the exchange that comes to the completion predicate no longer has the initial properties. Instead it has CamelGroupedExchange and CamelAggregatedSize which makes no use for the completionFromBatchConsumer()
As I don't actually need all exchanges being aggregated, it's enough to use GroupedBodyAggregationStrategy. Then exchange properties will remain as in the original exchange and just the body will contain an "array"
Another solution would be to use completionSize(Predicate predicate) and use a custom predicate that extracts necessary value from groupped exchanges.

How do I avoid using the dead letter queue using Camel

I have a in/out producer in Camel that only hangs around for a limited time before getting back to the caller. Some times this naturally results in a dead letter item and an exception being caught by the caller when the response is late.
What I would like to do is have the caller receive a timeout message instead of an exception and the item to never end up in the DLQ. Naturally I could put a listener on the DLQ but as the item has a home to go to it shouldn't really ever get to the DLQ.
Does anyone have a pattern for this? How would it be done? There are redundant consumer patterns (see Camel in Action link) but this is kind of a combined producer/consumer problem generated by the in/out pattern.
Sounds like you are using the Dead Letter Channel error handler, try using the noErrorHandler - http://camel.apache.org/error-handler

Camel need Non-blocking Queue - Analogous to not processing events on graphics thread

Sorry I answered my own question - it actually IS just SEDA, I assumed when I saw 'BlockingQueue' that SEDA would block until the queue had been read ... which of course is nonsense. SEDA is completely all I need. Question answered
I've got a problem that's compeletely screwing me, I've been provided a custom Endpoint by company we connect to, but the endpoint maintains a heart-beat to a feed, and when it sends messages above a certain size they take so long to process on the route that its blocking and the heartbeat gets lost and the connection goes down
Obviously this is analogous to processing events on a non-graphics thread to keep a smooth operation going. But I'm unsure how I'd achieve this in camel. Essentially I want to queue the results and have them on a separate thread.
from( "custom:endpoint" )
.process( MyProcesor )
.to( "some-endpoint")
as suggested camel-seda is a simple way to perform async/mult-threaded processing, beware that the blocking queues are in-memory only (lost if VM is stopped, etc). if you need guaranteed messaging support, use camel-jms

DeadlineExceededException and DataStore/Task Queue Operations

I'm doing some operations that should complete under 60 seconds but there may be some rare cases where it takes longer (but will never take longer than 10 minutes). It says in the app engine docs if you catch a DeadlineExceededException you have less than a second to do operations before it permanently fails. Would this be enough time to add a task to a queue and/or do a datastore write? I assume the safest way would be to add a task async/write a datastore entity (async) at the beginning of an operation and remove it from the queue if the operation completes. The latter method would use up twice as many api calls but is it worth it?
I would suggest to use the queue as default for all operations so you won't have to implement the fallback to it if you catch a dead line exceed error. It is more clean and easier to maintain along with the fact that the user doesn't have to wait for the operation to complete. In order to achieve this you can trigger your queue with an ajax call and get the result in the background, so the user will not wait for the operation to complete. Yes it worth's it, since it can "guarantee" the window of time you might need.
The runtime environment gives the request handler a little bit more time (less than a second) after raising the exception to prepare a custom response. so it would be sufficient to add that it into task queue.
If you do not want the client to keep polling for a task queue result, I suggest you have a look at the Channel API. It will enable you to implement push notifications to the client.
At the end of your task queue, you'll just have to send a notification to the client to let him now that is task has been processed.

Is there an elegant way to post messages to AWS SQS with visibility delay of longer than 15 minutes?

In Amazon Web Services, their queues allow you to post messages with a visibility delay up to 15 minutes. What if I don't want messages visible for 6 months?
I'm trying to come up with an elegant solution to the poll/push problem. I can write code to poll the SQS (or a database) every few seconds, check for messages that are ready to be visible, then move them to a "visible queue", or something like that. I wish there was a simpler, more reliable method to have messages become visible in queues far into the future without me having to worry about my polling application working perfectly all the time.
I'm not married to AWS, SQS or any of that, but I'd prefer to find a cloud-friendly solution that is stable, reliable and will trigger an event far into the future without me having to worry about checking on its status every day.
Any thoughts or alternate trees for me to explore barking up are welcome.
Thanks!
It sounds like you might be misunderstanding the visibility delay. Its purpose is to make sure that the polling application doesn't pull the same item off the queue more than once.
In other words, when the item is pulled off the queue it becomes invisible for a predetermined period of time (default is 30 seconds, max is 15 minutes) in case the polling system has a cluster of machines reading from the queue all at once.
Here's the relevant documentation:
http://docs.amazonwebservices.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/IntroductionArticle.html#AboutVT
...and the sentence in particular that relates to my comment is:
"Immediately after the component receives the message, the message is still in the queue. However, you don't want other components in the system receiving and processing the message again. Therefore, Amazon SQS blocks them with a visibility timeout, which is a period of time during which Amazon SQS prevents other consuming components from receiving and processing that message."
You should be able to use SQS for your purpose since you can leave an item in the queue for as long as you want.
7 years later, and Amazon still doesn't support the feature you need!
The two ways you can sort of get it to work are:
have messages contain a delivery target datetime in their message_attributes, and have the workers that consume the queue's messages just delete and recreate any message that is consumed before its target, with delay = max(0, min(secs_until_target_datetime, 900)) ; that would allow you to effectively schedule a message for any arbitrary future time;
or,
(slightly less frequent and costly:) similarly, if a message isn't due to be handled yet, recreate it and change its visibility timeout to be timeout = max(0, min(secs_until_target_datetime, 43200))
The disadvantage of using visibility timeout is that any read will re-trigger it.
There has been a direct AWS solution possible since 2016-12-01: AWS Step Functions
Each execution can last/idle up to one year, persists the state between transitions, and doesn't cost you any money while it waits.

Resources