Splitter behaviour with parallel processing - apache-camel

I have this route which exhibits a behaviour that I can't understand.
from("sftp://...)
.autoStartup(isReadyToStart())
.routeId(model.getName())
.filter().method(startTrigger)
.bean(RouteHelperIn.class,"start")
.bean(initialize)
.bean(RouteHelperIn.class, "startChunk")
.convertBodyTo(InputStream.class)
.split().tokenize(newLine)
.streaming()
.stopOnException()
.executorServiceRef(splitterExecutorService)
.filter().method(RouteHelperIn.class,"isContentRow") //Skip header rows using the removeFirstLines route parameter
.unmarshal(jdf)
.bean(mapper)
.bean(RouteHelperIn.class, "generateAndUpdateLog")
.bean(writeToDb)
.end()
.end()
.bean(RouteHelperIn.class, "endChunk")
.bean(postProcessing).to("direct://" + sub.getCopy())
.bean(RouteHelperIn.class, "writeLastImportLog")
.bean(RouteHelperIn.class, "end")
.end()
Let's suppose that the SFTP consumer find two files, and that splitterExecutorService is set to null.
In this case, the split is executed without parallelism and the two exchanges are executed sequentially. This is true because the second RouteHelperIn#start method is called after RouteHelperIn#stop, as expected.
When splitterExecutorService is set, then parallelism in the splitter is enabled. According to the documentation:
If enabled then processing each split messages occurs concurrently. Note the caller thread will still wait until all messages has been fully processed, before it continues. It’s only processing the sub messages from the splitter which happens concurrently.
What I see is that the second RouteHelperIn#start is started before than the first RouteHelperIn#end, and that is not what I was expecting, causing a serious bug in my scenario. Also, it seems that the thread ID is always the same, so the documentation is not respected. I was expecting the same behaviour than the previous case, with the only difference being that the steps described inside the splitter block were executed in parallel.
Could it be a bug in Apache Camel or am I making some wrong assumption here? I'm using version 3.11.3 now.
UPDATE
I've found that adding synchronous=true to the consumer URI solves the problem, IOW it behaves as expected. What I don't understand is why the route consumer is synchronous when the splitter is not parallelized and it's asynchronous when the splitter is parallelized, again according to what the documentation states.

Related

What is SourceFunction#run is supposed to work in Flink?

I have implemented a Source by extending RichSourceFunction for our Message Queue that Flink doesn't support.
When I implements the run method whose signature is:
override def run(sc: SourceFunction.SourceContext[String]): Unit = {
val msg = read_from_mq
sc.collect(msg)
}
When the run method is called, if there is no newer message in message queue,
Should I run without calling sc.collect or
I can wait until newer data comes(in this case, run method will be blocked).
I would prefer the 2nd one,not sure if this is the correct usage.
The run method of a Flink source should loop, endlessly producing output until its cancel method is called. When there's nothing to produce, then it's best if you can find a way to do a blocking wait.
The apache nifi source connector is another reasonable example to use as a model. You will note that it sleeps for a configurable interval when there's nothing for it to do.
As you probably know both options are functionally correct and will yield correct results.
This being said the second one is preferred because you're not holding the thread. In fact, if you take a look at the RabbitMQ connector implementation you'll notice that this exactly how it is implemented: inside its run it indirectly waits for messages to be placed on a BlockingQueue.

Aggregate results of batch consumer in Camel (for example from SQS)

I'm consuming messages from SQS FIFO queue with maxMessagesPerPoll=5 set.
Currently I'm processing each message individually which is a total waste of resources.
In my case, as we are using FIFO queue and all of those 5 messages are related to the same object, I could process them all toghether.
I though this might be done by using aggregate pattern but I wasn't able to get any results.
My consumer route looks like this:
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.process(exchange -> {
// process the message
})
I believe it should be possible to do something like this
from("aws-sqs://my-queue?maxMessagesPerPoll=5&messageGroupIdStrategy=usePropertyValue")
.aggregate(const(true), new GroupedExchangeAggregationStrategy())
.completionFromBatchConsumer()
.process(exchange -> {
// process ALL messages together as I now have a list of all exchanges
})
but the processor is never invoked.
Second thing:
If I'm able to make this work, when does ACK is sent to SQS? When each individual message is processed or when the aggregate process finishes? I hope the latter
When the processor is not called, the aggregator probably still waits for new messages to aggregate.
You could try to use completionSize(5) instead of completionFromBatchConsumer() for a test. If this works, the batch completion definition is the problem.
For the ACK against the broker: unfortunately no. I think the message is commited when it arrives at the aggregator.
The Camel aggregator component is a "stateful" component and therefore it must end the current transaction.
For this reason you can equip such components with persistent repositories to avoid data loss when the process is killed. In such a scenario the already aggregated messages would obviously be lost if you don't have a persistent repository attached.
The problem lies in GroupedExchangeAggregationStrategy
When I use this strategy, the output is an "array" of all exchanges. This means that the exchange that comes to the completion predicate no longer has the initial properties. Instead it has CamelGroupedExchange and CamelAggregatedSize which makes no use for the completionFromBatchConsumer()
As I don't actually need all exchanges being aggregated, it's enough to use GroupedBodyAggregationStrategy. Then exchange properties will remain as in the original exchange and just the body will contain an "array"
Another solution would be to use completionSize(Predicate predicate) and use a custom predicate that extracts necessary value from groupped exchanges.

Trying to stop Camel exchange processing based on filter

My question is very similar to this one but the solution there does not work for me here - I am trying to use the filter EIP to discard selected exchanges. My routes look like (edited down for clarity):
from("{{fromSource}}")
.convertBodyTo(RequestInterface.class)
.enrich(INVOKE_BACKEND_URI, combiner)
.to("{{toDestination}}");
from(INVOKE_BACKEND_URI)
.to(backendUri)
.filter().method(DiscardResponse.class).log(LoggingLevel.INFO, "Discarding undesired response").stop().end()
.convertBodyTo(BodyInterface.class);
When the filter does NOT select the message, all is well - the log() is not displayed and the message goes to the convertBodyTo() and then back to the main route.
However, when the filter DOES select the message, the log() text is displayed but the exchange still continues on to the convertBodyTo() where it throws an exception because it's a message that shouldn't be there. The stop() appears to either not be executed or has no affect.
Can anyone suggest a solution to this?
It is possible from within a Processor to do this in order to stop the exchange:
exchange.setProperty(Exchange.ROUTE_STOP, Boolean.TRUE);
Since I'm not used to writing my routes using Java DSL I don't know if that option is available directly on the exchange within the route, but it probably is.
I guess one way could be:
from(INVOKE_BACKEND_URI)
.to(backendUri)
.filter().method(DiscardResponse.class).log(LoggingLevel.INFO, "Discarding undesired response")
.choice()
.when(simple("${property.Exchange.FILTER_MATCHED}=true")
.stop()
.end()
.convertBodyTo(BodyInterface.class);
Take a look at the bottom of the doc here:
http://camel.apache.org/message-filter.html

Camel Multicast Subroutes out of order

I have a scenario where I get as input Message A. Message A must then be split into 3 different types of message, and forwarded to other routes. It is important that the messages arrive in a precise order, Ie. A-1 must be sent before A-2, which must be sent before A-3.
To do this I have done the following (outline):
from("activemq:queue:somequeue-local")
.multicast().to("direct:a1","direct:a2","direct:a3");
from("direct:a1)
//split incoming message and prepare output document for A-1
.to("activemq:queue:otherqueue")
.from("direct:a2)
//split incoming message and prepare output document for A-2
.to("activemq:queue:otherqueue")
.from("direct:a3)
//split incoming message and prepare output document for A-3
.to("activemq:queue:otherqueue")
And in another context, responsible for sending out the info to the external system, I have
.from("activemq:queue:otherqueue?maxMessagesPerTask=1&concurrentConsumers=1&maxConcurrentConsumers=1")
// do different stuff based on which type we are called with then end with
.beanref("somebean","writeToFileAndCallImportbat");
Now, my problem is, that when I get to the receiver, I get the messages in random order. Sometimes A-1,A-3,A-2, sometimes right, A-1,A-2,A-3.
I have tried adding JMSXGroupID and JMSXGroupSeq to the messages, but without any luck.
I have also tried skipping the MQ part entirely, and use direct-vm: to call the shared receiver, but then it looks like I have three simultanious invocations of the receiver at once, and still in random execution order.
I was under the impression that multicast would run sequential, unless otherwise prompted to?
Is there something fundamentally wrong with the approach taken?
I am using Camel version 2.12.
Or, said more plainly:
I would like a route that creates three different output messages, and executes a batch file on them, in order. How do I go about that?
If you use the Splitter pattern, have you checked to see if the streaming property is set to false.
If enabled then Camel will split in a streaming fashion, which means it will split the input message in chunks. This reduces the memory overhead. For example if you split big messages its recommended to enable streaming. If streaming is enabled then the sub-message replies will be aggregated out-of-order, eg in the order they come back. If disabled, Camel will process sub-message replies in the same order as they where splitted.
So, it turned out to not be a problem with multicast after all.
Rather, in each of my sub-routes, I did this:
.split(..stax(SpecialClass)).streaming()
.beanRef("transformationBean","somefunction")
.aggregate(constant("1"), new MyAggregator())
.completionTimeout(5000)
.completionSize(1000)
.to(writeToFileAndRunBat)
Which, I assumed meant "Process all elements in the split, and if you aren't finished in 5 seconds or after 1000 elements, break out".
I changed it to
.split(..stax(SpecialClass), , new MyAggregator()).streaming()
.beanRef("transformationBean","somefunction")
.end()
.to(writeToFileAndRunBat)
Coming to think of it, it makes perfect sense, as the first version couldn't really know when we were done, while the last (I assume) just iterate over all elements in the split and calls the Aggregator for each.
Also, I had to .end() in the first version. So I guess the whole thing was just acting random.

How do I avoid using the dead letter queue using Camel

I have a in/out producer in Camel that only hangs around for a limited time before getting back to the caller. Some times this naturally results in a dead letter item and an exception being caught by the caller when the response is late.
What I would like to do is have the caller receive a timeout message instead of an exception and the item to never end up in the DLQ. Naturally I could put a listener on the DLQ but as the item has a home to go to it shouldn't really ever get to the DLQ.
Does anyone have a pattern for this? How would it be done? There are redundant consumer patterns (see Camel in Action link) but this is kind of a combined producer/consumer problem generated by the in/out pattern.
Sounds like you are using the Dead Letter Channel error handler, try using the noErrorHandler - http://camel.apache.org/error-handler

Resources