Aggregate results of batch consumer in Camel (for example from SQS) - apache-camel

I'm consuming messages from SQS FIFO queue with maxMessagesPerPoll=5 set.
Currently I'm processing each message individually which is a total waste of resources.
In my case, as we are using FIFO queue and all of those 5 messages are related to the same object, I could process them all toghether.
I though this might be done by using aggregate pattern but I wasn't able to get any results.
My consumer route looks like this:
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.process(exchange -> {
// process the message
})
I believe it should be possible to do something like this
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.aggregate(const(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.process(exchange -> {
// process ALL messages together as I now have a list of all exchanges
})
but the processor is never invoked.
Second thing:
If I'm able to make this work, when does ACK is sent to SQS? When each individual message is processed or when the aggregate process finishes? I hope the latter

When the processor is not called, the aggregator probably still waits for new messages to aggregate.
You could try to use completionSize(5) instead of completionFromBatchConsumer() for a test. If this works, the batch completion definition is the problem.
For the ACK against the broker: unfortunately no. I think the message is commited when it arrives at the aggregator.
The Camel aggregator component is a "stateful" component and therefore it must end the current transaction.
For this reason you can equip such components with persistent repositories to avoid data loss when the process is killed. In such a scenario the already aggregated messages would obviously be lost if you don't have a persistent repository attached.

The problem lies in GroupedExchangeAggregationStrategy
When I use this strategy, the output is an "array" of all exchanges. This means that the exchange that comes to the completion predicate no longer has the initial properties. Instead it has CamelGroupedExchange and CamelAggregatedSize which makes no use for the completionFromBatchConsumer()
As I don't actually need all exchanges being aggregated, it's enough to use GroupedBodyAggregationStrategy. Then exchange properties will remain as in the original exchange and just the body will contain an "array"
Another solution would be to use completionSize(Predicate predicate) and use a custom predicate that extracts necessary value from groupped exchanges.

Related

Is there a way to broadcast configuration into all task managers or all FlatMapFunctions?

We currently have a flink-based streaming job (the task is composed of complex FlatMapFunctions DAG), and an http interface for fetching configuration.
Now I hope to read configuration from the http interface through a source function every 5 minutes with a parallelism of 1, and then distribute it to all task managers or FlatMapFunctions of the job. In FlatMapFunctions, the configuration will be read and will never not be changed.
I have read the documentationThe Broadcast State Pattern, but the method in the documentation seems to only apply to the first Function of the broadcast, and other subsequent downstream FlatMapFunctions cannot read the state of the broadcast. As shown in the figure below, only Co-Process-Broadcast can obtain the broadcast, but map func 1 and map func 2 cannot.
Broadcast state graph
Similar to QUESTION but different, I have many downstream FlatMapFunctions and expect them all to get the broadcast configuration.
You can send the broadcast stream to multiple functions, so if your config state isn't big then that's likely what I'd do.
If the config state is very small (relative to the size of records being processed) then you could attach it to every incoming record in your BroadcastProcessFunction, so downstream operators have it in hand when processing each of their records.

Apache camel - EIP design query

I have a EIP design related query.I have a requirement to process csv file by chunks and call a Rest API.After completion of processing of whole file i need to call another Rest API telling processing is complete.I wanted the route to be transacted so i have queue in between in case of end system not available the retry will happen at broker level.
My flow is as below.
First flow:
csv File->Split by chunk of 100 records->Place message in queue
the second flow(Transacted route):
Picks message from queue ->call the rest API
the second flow is transacted.Since iam breaking the flow and it is asynchronous iam not sure how to call to the completion call.I do not have a persistent store to status of each chunk processing.
is there anyway i can achive it using JMS functionality or Camel?
What you can use for your first flow is the Camel Splitter EIP:
http://camel.apache.org/splitter.html
And closely looking at the doc, you will find that there are three exchange properties available for each split exchange:
CamelSplitIndex: A split counter that increases for each Exchange being split. The counter starts from 0.
CamelSplitSize: The total number of Exchanges that was splitted. This header is not applied for stream based splitting. From Camel 2.9 onwards this header is also set in stream based splitting, but only on the completed Exchange.
CamelSplitComplete: Whether or not this Exchange is the last.
As they are exchange properties, you should put them to JMS headers before sending the messages to a queue. But then you should be able to make use of the information at the second flow, so you can know which is the last message.
Keep in mind, though, that it's all asynchronous so the CamelSplitComplete flag doesn't necessarily mean the last message at the second flow. You may create a stateful counter or utilise the Resequencer EIP http://camel.apache.org/resequencer.html to deal with the asynchronicity.

Camel Multicast Subroutes out of order

I have a scenario where I get as input Message A. Message A must then be split into 3 different types of message, and forwarded to other routes. It is important that the messages arrive in a precise order, Ie. A-1 must be sent before A-2, which must be sent before A-3.
To do this I have done the following (outline):
from("activemq:queue:somequeue-local")
.multicast().to("direct:a1","direct:a2","direct:a3");
from("direct:a1)
//split incoming message and prepare output document for A-1
.to("activemq:queue:otherqueue")
.from("direct:a2)
//split incoming message and prepare output document for A-2
.to("activemq:queue:otherqueue")
.from("direct:a3)
//split incoming message and prepare output document for A-3
.to("activemq:queue:otherqueue")
And in another context, responsible for sending out the info to the external system, I have
.from("activemq:queue:otherqueue?maxMessagesPerTask=1&concurrentConsumers=1&maxConcurrentConsumers=1")
// do different stuff based on which type we are called with then end with
.beanref("somebean","writeToFileAndCallImportbat");
Now, my problem is, that when I get to the receiver, I get the messages in random order. Sometimes A-1,A-3,A-2, sometimes right, A-1,A-2,A-3.
I have tried adding JMSXGroupID and JMSXGroupSeq to the messages, but without any luck.
I have also tried skipping the MQ part entirely, and use direct-vm: to call the shared receiver, but then it looks like I have three simultanious invocations of the receiver at once, and still in random execution order.
I was under the impression that multicast would run sequential, unless otherwise prompted to?
Is there something fundamentally wrong with the approach taken?
I am using Camel version 2.12.
Or, said more plainly:
I would like a route that creates three different output messages, and executes a batch file on them, in order. How do I go about that?
If you use the Splitter pattern, have you checked to see if the streaming property is set to false.
If enabled then Camel will split in a streaming fashion, which means it will split the input message in chunks. This reduces the memory overhead. For example if you split big messages its recommended to enable streaming. If streaming is enabled then the sub-message replies will be aggregated out-of-order, eg in the order they come back. If disabled, Camel will process sub-message replies in the same order as they where splitted.
So, it turned out to not be a problem with multicast after all.
Rather, in each of my sub-routes, I did this:
.split(..stax(SpecialClass)).streaming()
.beanRef("transformationBean","somefunction")
.aggregate(constant("1"), new MyAggregator())
.completionTimeout(5000)
.completionSize(1000)
.to(writeToFileAndRunBat)
Which, I assumed meant "Process all elements in the split, and if you aren't finished in 5 seconds or after 1000 elements, break out".
I changed it to
.split(..stax(SpecialClass), , new MyAggregator()).streaming()
.beanRef("transformationBean","somefunction")
.end()
.to(writeToFileAndRunBat)
Coming to think of it, it makes perfect sense, as the first version couldn't really know when we were done, while the last (I assume) just iterate over all elements in the split and calls the Aggregator for each.
Also, I had to .end() in the first version. So I guess the whole thing was just acting random.

How do I avoid using the dead letter queue using Camel

I have a in/out producer in Camel that only hangs around for a limited time before getting back to the caller. Some times this naturally results in a dead letter item and an exception being caught by the caller when the response is late.
What I would like to do is have the caller receive a timeout message instead of an exception and the item to never end up in the DLQ. Naturally I could put a listener on the DLQ but as the item has a home to go to it shouldn't really ever get to the DLQ.
Does anyone have a pattern for this? How would it be done? There are redundant consumer patterns (see Camel in Action link) but this is kind of a combined producer/consumer problem generated by the in/out pattern.
Sounds like you are using the Dead Letter Channel error handler, try using the noErrorHandler - http://camel.apache.org/error-handler

Camel need Non-blocking Queue - Analogous to not processing events on graphics thread

Sorry I answered my own question - it actually IS just SEDA, I assumed when I saw 'BlockingQueue' that SEDA would block until the queue had been read ... which of course is nonsense. SEDA is completely all I need. Question answered
I've got a problem that's compeletely screwing me, I've been provided a custom Endpoint by company we connect to, but the endpoint maintains a heart-beat to a feed, and when it sends messages above a certain size they take so long to process on the route that its blocking and the heartbeat gets lost and the connection goes down
Obviously this is analogous to processing events on a non-graphics thread to keep a smooth operation going. But I'm unsure how I'd achieve this in camel. Essentially I want to queue the results and have them on a separate thread.
from( "custom:endpoint" )
.process( MyProcesor )
.to( "some-endpoint")
as suggested camel-seda is a simple way to perform async/mult-threaded processing, beware that the blocking queues are in-memory only (lost if VM is stopped, etc). if you need guaranteed messaging support, use camel-jms

Resources