Apache Camel ProducerTemplate - apache-camel

Hi I am using apache camel + Spring and defined a configure like
public class MyOrderConsumerRouterBuilder extends RouteBuilder implements InitializingBean, ApplicationContextAware{
#Override
public void configure() throws Exception {
from("seda:asyncChannel?concurrentConsumers=20").id("asyncProcessChannelFromId")
.to("bean:OrderProcessManager?method=processOrders").id("asyncProcessChannelToId");
}
}
Is this Producer multithread? I see that consumers are multiple. In my case it is : concurrentConsumers=20
I checked below URL
How do I configure the default maximum cache size for ProducerCache or ProducerTemplate
As per source code DefaultCamelContext.createProducerTemplate() DefaultCamelContext DefaultProducerTemplate is being created with maximumCacheSize (default 1000)
As per this I understand this there can be multiple producers which are being defined using maximumCacheSize as LRU. In my case I have only one endpoint i.e SEDA so there will be only one producer.
So I think there will always be one single threaded producer. Please help me to understand it better.

The producer is not multithreaded, but there are multiple producers.
In your case 20 consumers (threads) are waiting for messages. If a message arrives, it is processed according to the route definition by one of these threads.
If another message arrives the thread who processes the first message is probably still occupied, but one of the other 19 free threads can process the message.
As long as there are no Splitters, Aggregators and similar EIPs, a single thread "walks" the message through your route and in your case finally sends the message to the OrderProcessManager bean. So this producing step (calling the bean method) is obviously done by a single thread for a single message.
BUT since you can have up to 20 threads processing messages in parallel, the OrderProcessManager bean can be called by up to 20 producers (threads) in parallel.

Related

How to use a PipedInputStream in a Producer Endpoint in Apache Camel

I am using Camel to consume a huge (GB) file, process/modify some data in the file and finally forward the modified file via an AWS-S3, FTP or SFTP producer endpoint to its target. In the actual usage scenario using an intermediate(temporary) file holding the processed data is not allowed.
In case of the AWS producer, the configure method of the corresponding RouteBuilder specifies the route as follows:
from("file:/...")
.streamCaching()
.process(new CustomFileProcessor())
.to("aws-s3://...");
In it's process(Exchange exchange) method the CustomFileProcessor reads the input data from exchange.getIn().getBody(InputStream.class) and writes the processed and modified data into a PipedOutputStream.
Now the PipedInputStream connected with this PipedOutputStream should be used as the source for the producer sending the data to AWS-S3.
I tried exchange.getOut().setBody(thePipedInputStream) in the process method but this doesn't work and seems to create a deadlock.
So what is the correct way - if it is possible at all - of piping the processed output data of the CustomFileProcessor to the producer endpoint so that the entire data is send over?
Many thanks in advance.
After further digging into this the solution was quite simple. I only needed to place the pipe reader into a separate thread and the problem was solved.

how to perform parallel processing of gcp pubsub messages in apache camel

I have this code below that takes message from pubsub source topic -> transform it as per a template -> then publish the transformed message to a target topic.
But to improve performance I need to do this task in parallel.That is i need to poll 500 messages,and then transform it in parallel and then publish them to the target topic.
From the camel gcp component documentation I believe maxMessagesPerPoll and concurrentConsumers parameter will do the job.Due to lack of documentation I am not sure how does it internally works.
I mean a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic b)what about ordering of the messages c) should I be looking at parallel processing EIPs as an alternative
etc.
The concept is not clear to me
Was go
// my route
private void addRouteToContext(final PubSub pubSub) throws Exception {
this.camelContext.addRoutes(new RouteBuilder() {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("google-pubsub:{{gcp_project_id}}:{{pubsub.dead.letter.topic}}")
.useOriginalMessage().onPrepareFailure(new FailureProcessor()));
/*
* from topic
*/
from("google-pubsub:{{gcp_project_id}}:" + pubSub.getFromSubscription() + "?"
+ "maxMessagesPerPoll={{consumer.maxMessagesPerPoll}}&"
+ "concurrentConsumers={{consumer.concurrentConsumers}}").
/*
* transform using the velocity
*/
to("velocity:" + pubSub.getToTemplate() + "?contentCache=true").
/*
* attach header to the transform message
*/
setHeader("Header ", simple("${date:now:yyyyMMdd}")).routeId(pubSub.getRouteId()).
/*
* log the transformed event
*/
log("${body}").
/*
* publish the transformed event to the target topic
*/
to("google-pubsub:{{gcp_project_id}}:" + pubSub.getToTopic());
}
});
}
a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic
No, Camel does not create 500 parallel threads in this case. As you suspect, the number of concurrent consumer threads is set with concurrentConsumers. So if you define 5 concurrentConsumers with a maxMessagesPerPoll of 500, every consumer will fetch up to 500 messages and process them one after the other in a single thread. That is, you have 5 messages processed in parallel.
what about ordering of the messages
As soon as you process messages in parallel, the order of messages is messed up. But this already happens with 1 Consumer when you got processing errors and they are detoured to your deadLetterChannel and reprocessed later.
should I be looking at parallel processing EIPs as an alternative
Only if the concurrentConsumers option is not sufficient.
When you mention the concurrentConsumers option(let's say concurrentConsumers=10), you are asking Camel to create a thread pool of 10 threads, and each of those 10 threads will pick up a different message from the pub-sub queue and process them.
The thing to note here is that when you are specifying the concurrentConsumers option, the thread pool uses a fixed size, which means that a fixed number of active threads are waiting at all times to process incoming messages. So 10 threads(since I specified concurrentConsumers=10) will be waiting to process my messages, even if there aren't 10 messages coming in simultaneously.
Obviously, this is not going to guarantee that the incoming messages will be processed in the same order. If you are looking to have the messages in the same order, you can have a look at the Resequencer EIP to order your messages.
As for your third question, I don't think google-pubsub component allows a parallel processing option. You can make your own using the Threads EIP. This would definitely give more control over your concurrency.
Using Threads, your code would look something like this:
from("google-pubsub:project-id:destinationName?maxMessagesPerPoll=20")
// the 2 parameters are 'pool size' and 'max pool size'
.threads(5, 20)
.to("direct:out");

.threads and parallelProcessing in camel

Route :
ExecutorService executorService = new ThreadPoolBuilder(context).poolSize(10).maxQueueSize(100).build("myCustom-Thread");
from("direct:myROute")
.aggregate(myAggregationStrategy).constant(true)
.threads().executorService(executorService)
//.parallelProcessing().executorService(executorService)
.process(myProcessor)
.end();
I am trying to process the processor logic to run in multithreaded, but in the available options from camel I am getting confused on below behavior
Using .threads with executorService
though it is getting me 10 threads as per the executorService configuration, but all threads are running one by one. which gives me the same behavior as of single thread and also degrading the performance
Using .parallelProcessing with executorService
all threads(10) are running in parallel, but caller thread(main thread of route) get called while threads(parallel processing threads) are still doing their job.
which is not correct.
how can I handle this scenario, in which I can call the caller once all threads are complete their job?

Camel ApacheMQ -> AHC behaviour (blocking?)

I just started using Apache Camel and I'm curious about the seemingly counter-intuitive default behaviour of asynchronous http client (AHC). While consuming messages from ActiveMQ, I can't get it to act in a non-blocking fashion.
My route looks like this:
#Component
public class Broadcaster extends RouteBuilder {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("activemq:failed.messages"));
from("activemq:outbound.messages")
.setExchangePattern(ExchangePattern.InOnly)
.recipientList(simple("ahc:${in.header[PublishDestination]}"))
.end();
}
}
I enqueued several messages, half of which I sent to a delayed web server, and the other half to a normal one. I expected to see all the normal messages consumed immediately by the fast server, and the slow messages gradually over time. However, this was the behaviour observed on the fast web server:
00:24:02.585, <hello>World</hello>
00:24:03.622, <hello>World</hello>
00:24:04.640, <hello>World</hello>
00:24:05.658, <hello>World</hello>
As you can see there is exactly one second between each logged request that corresponds to the artificial 1 second delay on the slow server. Based on the route timings, it looks like the JMS consumer is waiting for AHC to complete before it consumes the next message off the queue:
Processor Elapsed (ms)
[activemq://outbound.messages ] [ 1020]
[setExchangePattern[InOnly] ] [ 0]
[ahc:${in.header[PublishDestination]}} ] [ 1018]
Am I supposed to explicitly use async producers and write callback handlers in these cases, or is there something else I'm missing? Thank you!
Well, case of a RTFM I guess, although I ActiveMQ page leaves a lot to be desired in terms of properties available for endpoint configuration. There should probably be a note to say most (all?) JMS config options are also available for ActiveMQ component. In any case, the solution is to define the consumer as follows:
from("activemq:outbound.messages?asyncConsumer=true")

How can I improve the performance of a Seda queue?

Take this example:
from("seda:data").log("data added to queue")
.setHeader("CamelHttpMethod", constant("POST"))
.setHeader(Exchange.CONTENT_TYPE, constant("application/json"))
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
exchange.setProperty(Exchange.CHARSET_NAME, "UTF-8");
}
})
.recipientList(header(RECIPIENT_LIST))
.ignoreInvalidEndpoints().parallelProcessing();
Assume the RECIPENT_LIST header contains only one http endpoint. For a given http endpoint, messages should be processed in order, but two messages for different end points can be processed in parallel.
Basically, I want to know if there is anything be done to improve performance. For example, would using concurrentConsumers help?
SEDA with concurrentConsumers > 1 would absolutely help with throughput because it would allow multiple threads to run in parallel...but you'll need to implement your own locking mechanism to make sure only a single thread is hitting a given http endpoint at a given time
otherwise, here is an overview of your options: http://camel.apache.org/parallel-processing-and-ordering.html
in short, if you can use JMS, then consider using ActiveMQ message groups as its trivial to use and is designed for exactly this use case (parallel processing, but single threaded by groups of messages, etc).

Resources