How to use multithreading inside an JPA camel route - apache-camel

We have an importer running on a powerful, multi-core server. However, our Apache Camel routes are single threaded, which is a shame.
Our [camel] importer is a single-instance program. How can I make a specific route process the messages using multiple threads? The messages are atomic and are processed by a bean, which already does this in a thread-safe way.
I am already happy if I could process batches (maxMessagesPerPoll) in threads and have idle time until the next poll takes place (after all, that's still better than sequential processing).
Here is one of the routes I would like to make multithreaded:
public void onConfigure() throws Exception {
// This is a JPA query which selects all unprocessed modules
String query = RouteQueryHelper.selectNextUnprocessedStaged(ImportAction.IMPORT_MODULES);
from("jpa:com.so.importer.entity.ModuleStageEntity" +
"?consumer.query=" + query +
"&maxMessagesPerPoll=2000" +
"&consumeLockEntity=false" +
"&consumer.delay=1000" +
"&consumeDelete=false")
.transacted().policy("CAMEL_DEFAULT_POLICY")
.bean(moduleImportService) // processes the entity and updates it's status flag
.to("log:import-module?groupInterval=10000")
.routeId("so.route.import-module");
}
The route has consumeDelete=false, because we use a status property on the entity instead (which is modified and saved). The status property is also respected in the consumer.query.
We use camel version 2.17.1 in spring boot (1.3.8.RELEASE) on Java 8.
EDIT 2019-Jan-21: The entities have a method with #Consumed on them, which pushes the entity into the next route after it was processed:
#Consumed
public void gotoNextStatus() {
switch (stageStatus) {
case STAGED: setStageStatus(StageStatus.IMPORTED); break;
case IMPORTED: setStageStatus(StageStatus.RENDERED); break;
case RENDERED: setStageStatus(StageStatus.DONE); break;
}
}

You could introduce some asynchronisation by sending your messages to an intermediate SEDA endpoint:
from("jpa:")
...
.to("seda:intermediateStage")
And then put the real processing inside a new route with N concurrrent SEDA consumers (default is one):
from("seda:intermediateStage?concurrentConsumers=5")
.process(...)

Related

how to perform parallel processing of gcp pubsub messages in apache camel

I have this code below that takes message from pubsub source topic -> transform it as per a template -> then publish the transformed message to a target topic.
But to improve performance I need to do this task in parallel.That is i need to poll 500 messages,and then transform it in parallel and then publish them to the target topic.
From the camel gcp component documentation I believe maxMessagesPerPoll and concurrentConsumers parameter will do the job.Due to lack of documentation I am not sure how does it internally works.
I mean a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic b)what about ordering of the messages c) should I be looking at parallel processing EIPs as an alternative
etc.
The concept is not clear to me
Was go
// my route
private void addRouteToContext(final PubSub pubSub) throws Exception {
this.camelContext.addRoutes(new RouteBuilder() {
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("google-pubsub:{{gcp_project_id}}:{{pubsub.dead.letter.topic}}")
.useOriginalMessage().onPrepareFailure(new FailureProcessor()));
/*
* from topic
*/
from("google-pubsub:{{gcp_project_id}}:" + pubSub.getFromSubscription() + "?"
+ "maxMessagesPerPoll={{consumer.maxMessagesPerPoll}}&"
+ "concurrentConsumers={{consumer.concurrentConsumers}}").
/*
* transform using the velocity
*/
to("velocity:" + pubSub.getToTemplate() + "?contentCache=true").
/*
* attach header to the transform message
*/
setHeader("Header ", simple("${date:now:yyyyMMdd}")).routeId(pubSub.getRouteId()).
/*
* log the transformed event
*/
log("${body}").
/*
* publish the transformed event to the target topic
*/
to("google-pubsub:{{gcp_project_id}}:" + pubSub.getToTopic());
}
});
}
a) if I poll say 500 message ,will then it create 500 parallel route that will process the messages and publish it to the target topic
No, Camel does not create 500 parallel threads in this case. As you suspect, the number of concurrent consumer threads is set with concurrentConsumers. So if you define 5 concurrentConsumers with a maxMessagesPerPoll of 500, every consumer will fetch up to 500 messages and process them one after the other in a single thread. That is, you have 5 messages processed in parallel.
what about ordering of the messages
As soon as you process messages in parallel, the order of messages is messed up. But this already happens with 1 Consumer when you got processing errors and they are detoured to your deadLetterChannel and reprocessed later.
should I be looking at parallel processing EIPs as an alternative
Only if the concurrentConsumers option is not sufficient.
When you mention the concurrentConsumers option(let's say concurrentConsumers=10), you are asking Camel to create a thread pool of 10 threads, and each of those 10 threads will pick up a different message from the pub-sub queue and process them.
The thing to note here is that when you are specifying the concurrentConsumers option, the thread pool uses a fixed size, which means that a fixed number of active threads are waiting at all times to process incoming messages. So 10 threads(since I specified concurrentConsumers=10) will be waiting to process my messages, even if there aren't 10 messages coming in simultaneously.
Obviously, this is not going to guarantee that the incoming messages will be processed in the same order. If you are looking to have the messages in the same order, you can have a look at the Resequencer EIP to order your messages.
As for your third question, I don't think google-pubsub component allows a parallel processing option. You can make your own using the Threads EIP. This would definitely give more control over your concurrency.
Using Threads, your code would look something like this:
from("google-pubsub:project-id:destinationName?maxMessagesPerPoll=20")
// the 2 parameters are 'pool size' and 'max pool size'
.threads(5, 20)
.to("direct:out");

How to schedule JMS consuming in Apache Camel?

I need to consume JMS messages with Camel everyday at 9pm (or from 9pm to 10pm to give it the time to consume all the messages).
I can't see any "scheduler" option for URIs "cMQConnectionFactory:queue:myQueue" while it exists for "file://" or "ftp://" URIs.
If I put a cTimer before it will send an empty message to the queue, not schedule the consumer.
You can use a route policy where you can setup for example a cron expression to tell when the route is started and when its stopped.
http://camel.apache.org/scheduledroutepolicy.html
Other alternatives is to start/stop the route via the Java API or JMX etc and have some other logic that knows when to do that according to the clock.
This is something that has caused me a significant amount of trouble. There are a number of ways of skinning this cat, and none of them are great as far as I can see.
On is to set the route not to start automatically, and use a schedule to start the route and then stop it again after a short time using the controlbus EIP. http://camel.apache.org/controlbus.html
I didn't like this approach because I didn't trust that it would drain the queue completely once and only once per trigger.
Another is to use a pollEnrich to query the queue, but that only seems to pick up one item from the queue, but I wanted to completely drain it (only once).
I wrote a custom bean that uses consumer and producer templates to read all the entries in a queue with a specified time-out.
I found an example on the internet somewhere, but it took me a long time to find, and quickly searching again I can't find it now.
So what I have is:
from("timer:myTimer...")
.beanRef( "myConsumerBean", "pollConsumer" )
from("direct:myProcessingRoute")
.to("whatever");
And a simple pollConsumer method:
public void pollConsumer() throws Exception {
if ( consumerEndpoint == null ) consumerEndpoint = consumer.getCamelContext().getEndpoint( endpointUri );
consumer.start();
producer.start();
while ( true ) {
Exchange exchange = consumer.receive( consumerEndpoint, 1000 );
if ( exchange == null ) break;
producer.send( exchange );
consumer.doneUoW( exchange );
}
producer.stop();
consumer.stop();
}
where the producer is a DefaultProducerTemplate, consumer is a DefaultConsumerTemplate, and these are configured in the bean configuration.
This seems to work for me, but if anyone gives you a better answer I'll be very interested to see how it can be done better.

Apache Camel ProducerTemplate

Hi I am using apache camel + Spring and defined a configure like
public class MyOrderConsumerRouterBuilder extends RouteBuilder implements InitializingBean, ApplicationContextAware{
#Override
public void configure() throws Exception {
from("seda:asyncChannel?concurrentConsumers=20").id("asyncProcessChannelFromId")
.to("bean:OrderProcessManager?method=processOrders").id("asyncProcessChannelToId");
}
}
Is this Producer multithread? I see that consumers are multiple. In my case it is : concurrentConsumers=20
I checked below URL
How do I configure the default maximum cache size for ProducerCache or ProducerTemplate
As per source code DefaultCamelContext.createProducerTemplate() DefaultCamelContext DefaultProducerTemplate is being created with maximumCacheSize (default 1000)
As per this I understand this there can be multiple producers which are being defined using maximumCacheSize as LRU. In my case I have only one endpoint i.e SEDA so there will be only one producer.
So I think there will always be one single threaded producer. Please help me to understand it better.
The producer is not multithreaded, but there are multiple producers.
In your case 20 consumers (threads) are waiting for messages. If a message arrives, it is processed according to the route definition by one of these threads.
If another message arrives the thread who processes the first message is probably still occupied, but one of the other 19 free threads can process the message.
As long as there are no Splitters, Aggregators and similar EIPs, a single thread "walks" the message through your route and in your case finally sends the message to the OrderProcessManager bean. So this producing step (calling the bean method) is obviously done by a single thread for a single message.
BUT since you can have up to 20 threads processing messages in parallel, the OrderProcessManager bean can be called by up to 20 producers (threads) in parallel.

Apache Camel splitter with hazelcast seda queue

I'm trying to do a file import process where a file is picked up in a subdirectory of a given folder, the subdirectory identifying the client the file is for, then the records are parsed, split, and sent on Hazelcast SEDA queues. I want to process each record as its read off of the Hazelcast SEDA queue, then it returns a status code (created, updated, or errored) which can be aggregated.
I'm also creating a job record when the file is first picked up and I want to update the job record with the final count of created, updated, and errors.
The JobProcessor below creates this record and sets the client Organization and Job objects in headers on the message. The CensusExcelDataFormat reads an Excel file and creates an Employee object for each line, then returns a Collection.
from("file:" + censusDirectory + "?recursive=true").idempotentConsumer(new SimpleExpression("file:name"), idempotentRepository)
.process(new JobProcessor(organizationService, jobService, Job.JobType.CENSUS))
.unmarshal(censusExcelDataFormat)
.split(body(), new ListAggregationStrategy()).parallelProcessing()
.to(ExchangePattern.InOut, "hazelcast:seda:process-employee-import").end()
.process(new JobCompletionProcessor(jobService))
.end();
from("hazelcast:seda:process-employee-import")
.idempotentConsumer(simple("${body.entityId}"), idempotentRepository)
.bean(employeeImporterJob, "importOrUpdate");
The problem I'm having is that the list aggregation happens immediately and instead of getting a list of statuses I'm getting the same list of Employee objects. I want the Employee objects to be sent on the SEDA queue and the return value from the processing on the queue to be aggregated then run through the JobCompletionProcessor to update the Job record.
The behaviour is you are seeing is the default behavior. The apache camel splitter documentation clearly states this in the what the splitter returns section.
Camel 2.2 or older: The Splitter will by default return the last
splitted message.
Camel 2.3 and newer: The Splitter will by default return the
original input message.
For all versions: You can override this by supplying your own
strategy as an AggregationStrategy. There is a sample on this page
(Split aggregate request/reply sample). Notice it's the same
strategy as the Aggregator supports. This Splitter can be viewed as
having a build in light weight Aggregator.
So as you can see you are required to implement your own splitter aggregation strategy. To do this create a new class that implements AggrgationStrategy something like the code below:
public class MyAggregationStrategy implements AggregationStrategy
{
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if (oldExchange == null) //this would be null on the first exchange.
{
//do some work on the first time if needed
}
/*
Here you put your code to calculate failed, updated, created.
*/
}
}
You can then use your custom aggregation strategy by specifying it like the following examples:
.split(body(), new MyAggregationStrategy()) //Java DSL
<split strategyRef="myAggregationStrategy"/> //XML Blueprint

How to broadcast a message within the same JVM process with Camel?

I am consuming a directory and I would like to broadcast a message to listeners within the same JVM process. I don't know who the interested parties are because they register themselves when they come up: the set of services within my JVM process depends on configuration.
Multicast does not seem to be what I want because I don't know at route build time where to send messages.
Besides using a queuing solution (ActiveMQ, RabbitMQ), are there other solutions?
Beside the queuing solutions (JMS/ActiveMQ and RabbitMQ), you could use the VM component for intra JVM communication. VM is an extension of the SEDA component. In contrast to SEDA that can only be used for communication between different routes in a single Camel context, VM can be used for communication between routes running in different contexts.
Sending a message:
final ProducerTemplate template = context.createProducerTemplate();
template.sendBody("vm:start", "World!");
With multipleConsumers=true it is possible to simulate Publish-Subscribe messaging, i.e. it is possible to configure more than one consumer:
from("vm:start?multipleConsumers=true")
.log("********** Hello: 1 ************");
from("vm:start?multipleConsumers=true")
.log("********** Hello: 2 ************");
This prints:
route1 INFO ********** Hello: 1 ************
route2 INFO ********** Hello: 2 ************
However, in contrast to JMS/ActiveMQ and RabbitMQ, the messages can not leave the JVM. And the messages are not persisted. That means that the messages are lost, a) if no consumer has been started when the message is sent, b) if the JVM crashes before the messages are consumed.
use the recipient list pattern as it resolves the destination endpoints at runtime...
for example, you could implement a method to dynamically determine the recipients, etc...
from("direct:test").recipientList().method(MessageRouter.class, "routeTo");
public class MessageRouter {
public String[] routeTo() {
return new String[] {
"direct:a", "direct:b"
};
}
}
The Camel SEDA component can give you this. However, it's only valid inside of the current Camel context. If that restriction works for you it's the way to go. It'll handle both a queue style messaging system or pub/sub.
Camel SEDA Component

Resources