.threads and parallelProcessing in camel - apache-camel

Route :
ExecutorService executorService = new ThreadPoolBuilder(context).poolSize(10).maxQueueSize(100).build("myCustom-Thread");
from("direct:myROute")
.aggregate(myAggregationStrategy).constant(true)
.threads().executorService(executorService)
//.parallelProcessing().executorService(executorService)
.process(myProcessor)
.end();
I am trying to process the processor logic to run in multithreaded, but in the available options from camel I am getting confused on below behavior
Using .threads with executorService
though it is getting me 10 threads as per the executorService configuration, but all threads are running one by one. which gives me the same behavior as of single thread and also degrading the performance
Using .parallelProcessing with executorService
all threads(10) are running in parallel, but caller thread(main thread of route) get called while threads(parallel processing threads) are still doing their job.
which is not correct.
how can I handle this scenario, in which I can call the caller once all threads are complete their job?

Related

How to build a async rest endpoint that calls blocking action in worker thread and replies instantly (Quarkus)

I checked the docs and stackoverflow but didn't find exactly a suiting approach.
E.g. this post seems very close: Dispatch a blocking service in a Reactive REST GET endpoint with Quarkus/Mutiny
However, I don't want so much unneccessary boilerplate code in my service, at best, no service code change at all.
I generally just want to call a service method which uses entity manager and thus is a blocking action, however, want to return a string to the caller immidiately like "query started" or something. I don't need a callback object, it's just a fire and forget approach.
I tried something like this
#NonBlocking
#POST
#Produces(MediaType.TEXT_PLAIN)
#Path("/query")
public Uni<String> triggerQuery() {
return Uni.createFrom()
.item("query started")
.call(() -> service.startLongRunningQuery());
}
But it's not working -> Error message returned to the caller:
You have attempted to perform a blocking operation on a IO thread. This is not allowed, as blocking the IO thread will cause major performance issues with your application. If you want to perform blocking EntityManager operations make sure you are doing it from a worker thread.",
I actually expected quarkus takes care to distribute the tasks accordingly, that is, rest call to io thread and blocking entity manager operations to worker thread.
So I must using it wrong.
UPDATE:
Also tried an proposed workaround that I found in https://github.com/quarkusio/quarkus/issues/11535 changing the method body to
return Uni.createFrom()
.item("query started")
.emitOn(Infrastructure.getDefaultWorkerPool())
.invoke(()-> service.startLongRunningQuery());
Now I don't get an error, but service.startLongRunningQuery() is not invoked, thus no logs and no query is actually sent to db.
Same with (How to call long running blocking void returning method with Mutiny reactive programming?):
return Uni.createFrom()
.item(() ->service.startLongRunningQuery())
.runSubscriptionOn(Infrastructure.getDefaultWorkerPool())
Same with (How to run blocking codes on another thread and make http request return immediately):
ExecutorService executor = Executors.newFixedThreadPool(10, r -> new Thread(r, "CUSTOM_THREAD"));
return Uni.createFrom()
.item(() -> service.startLongRunningQuery())
.runSubscriptionOn(executor);
Any idea why service.startLongRunningQuery() is not called at all and how to achieve fire and forget behaviour, assuming rest call handled via IO thread and service call handled by worker thread?
It depends if you want to return immediately (before your startLongRunningQuery operation is effectively executed), or if you want to wait until the operation completes.
If the first case, use something like:
#Inject EventBus bus;
#NonBlocking
#POST
#Produces(MediaType.TEXT_PLAIN)
#Path("/query")
public void triggerQuery() {
bus.send("some-address", "my payload");
}
#Blocking // Will be called on a worker thread
#ConsumeEvent("some-address")
public void executeQuery(String payload) {
service.startLongRunningQuery();
}
In the second case, you need to execute the query on a worker thread.
#POST
#Produces(MediaType.TEXT_PLAIN)
#Path("/query")
public Uni<String> triggerQuery() {
return Uni.createFrom(() -> service.startLongRunningQuery())
.runSubscriptionOn(Infrastructure.getDefaultWorkerPool());
}
Note that you need RESTEasy Reactive for this to work (and not classic RESTEasy). If you use classic RESTEasy, you would need the quarkus-resteasy-mutiny extension (but I would recommend using RESTEasy Reactive, it will be way more efficient).
Use the EventBus for that https://quarkus.io/guides/reactive-event-bus
Send and forget is the way to go.

how to perform parallel processing of gcp pubsub messages in apache camel

I have this code below that takes message from pubsub source topic -> transform it as per a template -> then publish the transformed message to a target topic.
But to improve performance I need to do this task in parallel.That is i need to poll 500 messages,and then transform it in parallel and then publish them to the target topic.
From the camel gcp component documentation I believe maxMessagesPerPoll and concurrentConsumers parameter will do the job.Due to lack of documentation I am not sure how does it internally works.
I mean a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic b)what about ordering of the messages c) should I be looking at parallel processing EIPs as an alternative
etc.
The concept is not clear to me
Was go
// my route
private void addRouteToContext(final PubSub pubSub) throws Exception {
this.camelContext.addRoutes(new RouteBuilder() {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("google-pubsub:{{gcp_project_id}}:{{pubsub.dead.letter.topic}}")
.useOriginalMessage().onPrepareFailure(new FailureProcessor()));
/*
* from topic
*/
from("google-pubsub:{{gcp_project_id}}:" + pubSub.getFromSubscription() + "?"
+ "maxMessagesPerPoll={{consumer.maxMessagesPerPoll}}&"
+ "concurrentConsumers={{consumer.concurrentConsumers}}").
/*
* transform using the velocity
*/
to("velocity:" + pubSub.getToTemplate() + "?contentCache=true").
/*
* attach header to the transform message
*/
setHeader("Header ", simple("${date:now:yyyyMMdd}")).routeId(pubSub.getRouteId()).
/*
* log the transformed event
*/
log("${body}").
/*
* publish the transformed event to the target topic
*/
to("google-pubsub:{{gcp_project_id}}:" + pubSub.getToTopic());
}
});
}
a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic
No, Camel does not create 500 parallel threads in this case. As you suspect, the number of concurrent consumer threads is set with concurrentConsumers. So if you define 5 concurrentConsumers with a maxMessagesPerPoll of 500, every consumer will fetch up to 500 messages and process them one after the other in a single thread. That is, you have 5 messages processed in parallel.
what about ordering of the messages
As soon as you process messages in parallel, the order of messages is messed up. But this already happens with 1 Consumer when you got processing errors and they are detoured to your deadLetterChannel and reprocessed later.
should I be looking at parallel processing EIPs as an alternative
Only if the concurrentConsumers option is not sufficient.
When you mention the concurrentConsumers option(let's say concurrentConsumers=10), you are asking Camel to create a thread pool of 10 threads, and each of those 10 threads will pick up a different message from the pub-sub queue and process them.
The thing to note here is that when you are specifying the concurrentConsumers option, the thread pool uses a fixed size, which means that a fixed number of active threads are waiting at all times to process incoming messages. So 10 threads(since I specified concurrentConsumers=10) will be waiting to process my messages, even if there aren't 10 messages coming in simultaneously.
Obviously, this is not going to guarantee that the incoming messages will be processed in the same order. If you are looking to have the messages in the same order, you can have a look at the Resequencer EIP to order your messages.
As for your third question, I don't think google-pubsub component allows a parallel processing option. You can make your own using the Threads EIP. This would definitely give more control over your concurrency.
Using Threads, your code would look something like this:
from("google-pubsub:project-id:destinationName?maxMessagesPerPoll=20")
// the 2 parameters are 'pool size' and 'max pool size'
.threads(5, 20)
.to("direct:out");

Apache Camel ProducerTemplate

Hi I am using apache camel + Spring and defined a configure like
public class MyOrderConsumerRouterBuilder extends RouteBuilder implements InitializingBean, ApplicationContextAware{
#Override
public void configure() throws Exception {
from("seda:asyncChannel?concurrentConsumers=20").id("asyncProcessChannelFromId")
.to("bean:OrderProcessManager?method=processOrders").id("asyncProcessChannelToId");
}
}
Is this Producer multithread? I see that consumers are multiple. In my case it is : concurrentConsumers=20
I checked below URL
How do I configure the default maximum cache size for ProducerCache or ProducerTemplate
As per source code DefaultCamelContext.createProducerTemplate() DefaultCamelContext DefaultProducerTemplate is being created with maximumCacheSize (default 1000)
As per this I understand this there can be multiple producers which are being defined using maximumCacheSize as LRU. In my case I have only one endpoint i.e SEDA so there will be only one producer.
So I think there will always be one single threaded producer. Please help me to understand it better.
The producer is not multithreaded, but there are multiple producers.
In your case 20 consumers (threads) are waiting for messages. If a message arrives, it is processed according to the route definition by one of these threads.
If another message arrives the thread who processes the first message is probably still occupied, but one of the other 19 free threads can process the message.
As long as there are no Splitters, Aggregators and similar EIPs, a single thread "walks" the message through your route and in your case finally sends the message to the OrderProcessManager bean. So this producing step (calling the bean method) is obviously done by a single thread for a single message.
BUT since you can have up to 20 threads processing messages in parallel, the OrderProcessManager bean can be called by up to 20 producers (threads) in parallel.

Camel ApacheMQ -> AHC behaviour (blocking?)

I just started using Apache Camel and I'm curious about the seemingly counter-intuitive default behaviour of asynchronous http client (AHC). While consuming messages from ActiveMQ, I can't get it to act in a non-blocking fashion.
My route looks like this:
#Component
public class Broadcaster extends RouteBuilder {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("activemq:failed.messages"));
from("activemq:outbound.messages")
.setExchangePattern(ExchangePattern.InOnly)
.recipientList(simple("ahc:${in.header[PublishDestination]}"))
.end();
}
}
I enqueued several messages, half of which I sent to a delayed web server, and the other half to a normal one. I expected to see all the normal messages consumed immediately by the fast server, and the slow messages gradually over time. However, this was the behaviour observed on the fast web server:
00:24:02.585, <hello>World</hello>
00:24:03.622, <hello>World</hello>
00:24:04.640, <hello>World</hello>
00:24:05.658, <hello>World</hello>
As you can see there is exactly one second between each logged request that corresponds to the artificial 1 second delay on the slow server. Based on the route timings, it looks like the JMS consumer is waiting for AHC to complete before it consumes the next message off the queue:
Processor Elapsed (ms)
[activemq://outbound.messages ] [ 1020]
[setExchangePattern[InOnly] ] [ 0]
[ahc:${in.header[PublishDestination]}} ] [ 1018]
Am I supposed to explicitly use async producers and write callback handlers in these cases, or is there something else I'm missing? Thank you!
Well, case of a RTFM I guess, although I ActiveMQ page leaves a lot to be desired in terms of properties available for endpoint configuration. There should probably be a note to say most (all?) JMS config options are also available for ActiveMQ component. In any case, the solution is to define the consumer as follows:
from("activemq:outbound.messages?asyncConsumer=true")

How can I improve the performance of a Seda queue?

Take this example:
from("seda:data").log("data added to queue")
.setHeader("CamelHttpMethod", constant("POST"))
.setHeader(Exchange.CONTENT_TYPE, constant("application/json"))
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
exchange.setProperty(Exchange.CHARSET_NAME, "UTF-8");
}
})
.recipientList(header(RECIPIENT_LIST))
.ignoreInvalidEndpoints().parallelProcessing();
Assume the RECIPENT_LIST header contains only one http endpoint. For a given http endpoint, messages should be processed in order, but two messages for different end points can be processed in parallel.
Basically, I want to know if there is anything be done to improve performance. For example, would using concurrentConsumers help?
SEDA with concurrentConsumers > 1 would absolutely help with throughput because it would allow multiple threads to run in parallel...but you'll need to implement your own locking mechanism to make sure only a single thread is hitting a given http endpoint at a given time
otherwise, here is an overview of your options: http://camel.apache.org/parallel-processing-and-ordering.html
in short, if you can use JMS, then consider using ActiveMQ message groups as its trivial to use and is designed for exactly this use case (parallel processing, but single threaded by groups of messages, etc).

Resources