Parallel in Apache Camel Aggregator

Parallel in Apache Camel Aggregator - apache-camel

I have a Camel route configuration like below:
from("seda:logCall?concurrentConsumers=50")
.aggregate(header("EXCHANGE_ID"), new CustomAggregator())
.completionSize(2)
.parallelProcessing()
.to("bean:someAdapter?method=someMethod");
What I want to achieve is parallel processing all the way down means messages should be processed in parallel by aggregator and bean (after aggregation). However, when I was debugging I saw that aggregate blocks (run in a single thread). Bean processes messages in parallel so it is ok.
How should I configure aggregator to aggregate incoming messages in parallel?

You could try
from("seda:logCall?concurrentConsumers=50")
.threads().executorService(Executors.newCachedThreadPool())
.aggregate(header("EXCHANGE_ID"), new CustomAggregator())
.completionSize(2)
.parallelProcessing()
.to("bean:someAdapter?method=someMethod");

Related

How to Efficiently Pass XML Documents between Camel routes through ActiveMQ

I have a series of Camel routes that retrieve, transform, split, and combine XML documents. This all works fine.
These routes are linked by ActiveMQ topics and queues.
All good.
However, in some cases I have a large number of documents to process, and because Camel's JMS component transforms XML documents into text for the message, the queues result in the rendering of the XML to string, and re-parsing to documents more than once, which is a significant processing overhead.
I've tried setting the JMS producer jmsMessageType to Object, but when the consumer retrieves the message, and I output exchange.getIn().getBody().getClass().getCanonicalName() I get java.lang.String.
What settings would I need to put on the producer and the consumer for the XML Document objects to be passed directly through the ActiveMQ topic/queue without being rendered to String and re-parsed?
Thanks for looking.

Xerces supports Java serialization of its DOMs and Camel supports Java serialization. It's questionable though if it's really more efficient, quoting from Xerces documentation:
Some rough measurements have shown that XML serialization performs better than Java object serialization and that XML instance documents require less storage space than object serialized DOMs.
And there's another catch: Camel's Java serialization data format is deprecated and there's a risk that it will be removed in an upcoming Camel version. The implementation is very straight forward though and in case it gets deprecated you could add a custom data format replicating current Camel SerializationDataFormat.
If you want to try it out though, the producer could look like:
from(...)
// you need to hava Xerces DOM object in the exchange body at this point
.marshal().serialization()
.to("jms:myqueue");
...and consumer:
from("jms:myqueue")
.unmarshal().serialization()
// you should have your Xerces DOM again
...

Flink application is not receiving and processing the events from Kinesis connector generated when it was down

The problem: Flink application is not receiving and processing the events from Kinesis connector generated when it was down ( due to restart)
We have the following Flink env setting
env.enableCheckpointing(1000ms);
env.setStateBackend(new RocksDBStateBackend("file:///<filelocation>", true));
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(pause);
env.getCheckpointConfig().setCheckpointTimeout(timeOut);
env.getCheckpointConfig().setMaxConcurrentCheckpoints(concurrency);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
and Kinesis has following initial configuration
kinesisConsumerConfig.setProperty(ConsumerConfigConstants.STREAM_INITIAL_POSITION,
"LATEST");
Intrestingly when I change the Kinesis configuration to reply the event i.e.
kinesisConsumerConfig.setProperty(ConsumerConfigConstants.STREAM_INITIAL_POSITION,
"TRIM_HORIZON");
Flink is receiving all the buffered records (this includes those events generated before, during and after event Flink application was down) from Kinesis and processing it. Thus this behavior violates "Exactly once" property of the Flink application.
Can someone point out some obvious things I am missing?

The Flink Kinesis connector does store the shard sequence numbers in the state for exactly-once processing.
From your description, it seems like on your job "restart", the checkpointed state is not respected.
Just to first eliminate the obvious:
How is your job resuming from the restart?
Are you resuming from a savepoint, or is this restart automatically done from a previous checkpoint?

The previous answer is a good option, if you want to use checkpointing to track your consumer's popsition in the stream.
Here is an alternative with even more control. You can try using AT_TIMESTAMP as the STREAM_INITIAL_POSITION configuration option in your Flink Kinesis Connector.
This setting will need a configuration option STREAM_INITIAL_TIMESTAMP, which is the timestamp after which you need to read the messages from Kinesis.
The timestamp value can be maintained in several ways - a sink to update a text file, a sink to update in an external DB like DynamoDB, manually provided by the start up script, etc.
When the Flink application is restarted, provide the last processed timestamp as a runtime parameter and use it in the Kinesis consumer's configuration.
Your configuration will look like this:
Properties consumerConfig = new Properties();
consumerConfig.put(AWSConfigConstants.AWS_REGION, "us-east-1");
consumerConfig.put(AWSConfigConstants.AWS_ACCESS_KEY_ID, "aws_access_key_id");
consumerConfig.put(AWSConfigConstants.AWS_SECRET_ACCESS_KEY, "aws_secret_access_key");
consumerConfig.put(ConsumerConfigConstants.STREAM_INITIAL_POSITION, "AT_TIMESTAMP");
Double startTimeStamp = 1459799926.480; //Parameterize this!
consumerConfig.put(ConsumerConfigConstants.STREAM_INITIAL_TIMESTAMP, startTimeStamp);
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> kinesis = env.addSource(new FlinkKinesisConsumer<>(
"kinesis_stream_name", new SimpleStringSchema(), consumerConfig));

Read only no of messages from Queue using camel?

I am able to read the messages from activemq using camel context[xml], but i could like to read only no of the messages, for example if queue contains 10 000 messages, we want to read only first 1 000 messages, remaining shouldn't be touched.
I am new to the camel

It is not quite clear how you want your program to work. Do you want to stop the route after 1000 messages? Or your program? Or just finish them before processing the rest?
Anyway, the Jms component has a maxMessagesPerTask parameter that is the number of messages a task can receive after which it's terminated. That might do what you want.
"jms:queue:order?maxMessagesPerTask=1000"

What if there is only 500 messages in the queue, should you wait until you have received additional 500, so the total is 1000. And what if you restart your application etc.
It's a bit strange use-case. The Camel JMS component is designed to continuesly consume from the queue. If you want to stop then look at the Control Bus EIP where you can control Camel routes, and stop them. And also look at the RoutePolicy where you can control routes using that, for example look at the throttling route policy which can start/stop routes depending on load etc.
The CiA2 book also has coverage of managing and controlling Camel routes, you can look at in the management chapter.
http://camel.apache.org/controlbus.html
http://camel.apache.org/routepolicy.html

Enabling Replay mechanism with camel from messages from DB

Iam trying to implement replay mechanisam with camel ie., i will have to retrieve all the messages already persisted and forward to appropriate camel route to reprocess.This will be triggred by quartz scheduler.
I achieved the same by using below.
1) once the quartz scheduler is triggered, fwd to processor which will query db and have the message as list and set the same in camel exchange properties as list.
2) Use the camel in which LoopProcessor will set appropriate xml in the iteration in the exchange.
3) fwd it to activemq which will be forwarded to appropriate camel route to reprocess.
Every thing is fine.
I see the following TWO issues
a) there might be 'n' number of msges(10,000+) which will be available in the camel exchange properties in the form of List - i can remove the message which is sent for processing but i think this will do more good on performance and memory usage.
b) I don want to forward all the 10,000+ messages to activemq which i guess will make it exhaustive. Is there a better mechanism to forward 10000+ messages to activemq.
-- I am thinking to use SEDA/VM(using different camel contexts).how good this can give me considering above questions.
Thanks.
Regards
Senthil Kumar Sekar

If the number of messages is a problem, then not all messages should be loaded at once.
Process as follows (see also my answer for your other SO):
Limit the number of results when querying the DB.
Set a marker (e.g. processedFlag) for the DB entries that are processed
Begin at 1. and query only the not already processed entries until all records are processed.
However, you should test the ActiveMQ approach as well, if 10,000+ messages are really a problem or not.

How to control camel threads per destination route (activemq topic)?

I'm using servicemix and camel to publish to some activemq topics:
from("vm:all").recipientList().simple(String.format("activemq:topic:%s.%s.%s.%s.%s.%s","${header.type}", ...
but a new producer thread gets created per topic, any idea how I can control that?
Edit1:
Actually I realized that I was creating too many topics and should you Selectors instead: http://www.andrejkoelewijn.com/blog/2011/02/21/camel-activemq-topic-route-with-jms-selector/
Thanks for the help!

The typical scenario is that Camel is using JMSTemplate to fire messages to ActiveMQ. That means it creates a new producer each time. Actually, it creates a new connection, session and producer per message. That is usually no issue except performance wise.
This is typically handled by org.apache.activemq.pool.PooledConnectionFactory which caches connections, sessions and producers for you. However, depending on your configuration, it might create "some" producers initially, but they will be reused.
Check your Connection Factory settings in activemq-broker.xml and make sure you grab the PooledConnectionFactory.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Parallel in Apache Camel Aggregator - apache-camel

You could try from("seda:logCall?concurrentConsumers=50") .threads().executorService(Executors.newCachedThreadPool()) .aggregate(header("EXCHANGE_ID"), new CustomAggregator()) .completionSize(2) .parallelProcessing() .to("bean:someAdapter?method=someMethod");

Related

How to Efficiently Pass XML Documents between Camel routes through ActiveMQ

Flink application is not receiving and processing the events from Kinesis connector generated when it was down

Read only no of messages from Queue using camel?

Enabling Replay mechanism with camel from messages from DB

How to control camel threads per destination route (activemq topic)?

Categories

Resources