Apache Camel aggregate messages within timeframe - apache-camel

Is there any way to aggregate messages within time frame in Apache Camel.
I want to calcuate instant TPS of my system by pushing a messages into it and consuming from other side.
On the consumer side i want make an aggregator which aggregates messages passed within some time frame (lets say 1 second) and provides some calcualtions.
So far i can see Camel provides inactivity timeout handling for aggregation triggering.
Thanks.

see Camel's aggregator EIP for exactly this...
from("direct:start")
.aggregate(constant(true), new MyAggregationStrategy()).completionInterval(1000)
.to("mock:aggregated");

Related

Camel route - Filter all but first message

Can I filter messages so only one with a given correlation expression is forwarded?
I have a stream of messages from different devices. I want to keep an SQL table with all devices already encountered.
Trivial way would be to route all messages to an sql component with an insert statement. But this would create unnecessary load on the DB because devices send with a high frequency.
My current solution is to have a java predicate that returns true the first time the device id is encountered since last restart.
This works, but I would like to see if I can replace this with camel on-board methods - potentially making the route easier to understand.
Is there some way to use aggregation to only pass the first message with a given correlation value?
There is the Camel idempotent consumer that does exactly this.
With the help of a repository of already processed messages it drops any further message with the same identification characteristics.
This is very handy wherever you have at-least-once semantics on message delivery.

Camel Aggregator forceCompletionOnStop VS completeAllOnStop

What's the différence between forceCompletionOnStop and completeAllOnStop ? it is possible or recommanded to use both togheter ?
From the documentation:
forceCompletionOnStop Indicates to complete all current aggregated exchanges when the context is stopped
completeAllOnStop Indicates to wait to complete all current and partial (pending) aggregated exchanges when the context is stopped. This also means that we will wait for all pending exchanges which are stored in the aggregation repository to complete so the repository is empty before we can stop. You may want to enable this when using the memory based aggregation repository that is memory based only, and do not store data on disk. When this option is enabled, then the aggregator is waiting to complete all those exchanges before its stopped, when stopping CamelContext or the route using it.
Its little confusing for me it seems to be the same
completeAllOnStop is a "normal" completion criteria. The Camel aggregator implements the ShutdownAware interface and the completeAllOnStop criteria indicates to the Camel context that the aggregator needs some extra time before shutdown to complete its aggregations.
forceCompletionOnStop on the other hand tries to complete all aggregations during the shutdown process (prepareShutdown).
So to me they seem very similar too, both are trying to complete all aggregations before shutdown of the Camel context. I would recommend to use completeAllOnStop because this seems to be the more proactive way. See also the Camel Docs for more info about the shutdown strategy.
I don't know if you get a "double check" if you configure them both :-)
Be aware that even forceCompletionOnStop is skipped if the shutdown is a forced shutdown! In this case Camel tries to shutdown as fast as possible.
As far as I know Camel does a forced shutdown when the normal shutdown is not successful within a timeout.

What are the Camel constructs for "long-running routes"?

We are investigating Camel for use in a new system; one of our situations is that a set of steps is started, and some of the steps in the set can take hours or days to execute. We need to execute other steps after such long-elapsed-time steps are finished.
We also need to have the system survive reboot while some of the "routes" with these steps are in progress, i.e., the system might be rebooted, and we need the "routes" in progress to remain in their state and pick up where they left off.
I know we can use a queued messaging system such as JMS, and that messages put into such a queue can be handled to be persisted. What isn't clear to me is how (or whether) that fits into Camel -- would we need to treat the step(s) after each queue as its own route, so that it could, on startup, read from the queue? That's going to split up our processing steps into many more 'routes' than we would have otherwise, but maybe that's the way it's done.
Is/are there Camel construct/constructs which assist with this kind of system? If I know their terms and basic outline, I can probably figure it out, but what I really need is an explanation of what the constructs do.
Camel is not a human workflow / long-lasting tasks system. For that kind look at BPMS systems. Camel is more fitting for real time / near real time integrations.
For long tasks you persist their state in some external system like a message broker or database or BPMS, and then you can use Camel routes to process and move from one state to the next - or where Camel fit in such as integrating with the many different systems you can do OOTB with the 200+ Camel components.
Camel do provide graceful shutdown so you can safely shutdown or reboot Camel. But in the unlikely event of a crash, you may want to look at transactions and idempotency if you are talking about surviving a system crash.
You are referring to asynchronous processing of messages in routes. Camel has a couple of components that you can use to achieve this.
SEDA: In memory not persistent and can only call end points in the same route.
VM: In memory not persistent and can call endpoints in different camel contexts but limited to the same JVM. This component is a extension of SEDA.
JMS: Persistence is configurable on the queue stack. Much more heavy weight but also more fault tolerant than SEDA/JVM.
SEDA/JVM can be used as low overhead replacements for JMS components and in some cases I would use them exclusively. In your case the persistence element is a required so SEDA/JVM is not an option, but to keep things simple the examples will use SEDA as you can get some basics up and running quickly.
The example will assume the following we have a timer that kicks off and then there is two processes it needs to run. See screenshot below:
In this route the message flow is synchronous between the timer and the two process beans.
If we wanted to make these steps asynchronous we would need to break each step into a route of its own. We would then connect these routes using one of the components listed in the beginning. See the screenshot below:
Notice we have three routes each route only has one "processing" step in it. Route Test only has a timer which fires a messages to the SEDA queue called processOne. This message is received on the SEDA queue and sent to the Process_One bean. After this it is the send to the SEDA queue called processTwo, where it is received and passed to the Process_Two bean. All this is done asynchronously.
You can replace the SEDA components with JMS once you get to understand the concepts. I suspect that state tracking is going to be the most complicated part as Camel makes the asynchronous part easy.

Flink Kafka connector 0.10.0 Event time Clarification and ProcessFunction Clarification

I'm struggling with an issue regarding event time of flink's kafka's consumer connector.
Citing Flink doc
Since Apache Kafka 0.10+, Kafka’s messages can carry timestamps, indicating the time the event has occurred (see “event time” in Apache Flink) or the time when the message has been written to the Kafka broker.
The FlinkKafkaConsumer010 will emit records with the timestamp attached, if the time characteristic in Flink is set to TimeCharacteristic.EventTime (StreamExecutionEnvironment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)).
The Kafka consumer does not emit watermarks.
Some questions and issues come to mind:
How do I know if it timestamp taken is of the time it occurred or time written to the Kafka broker?
If the consumer does not emit watermarks and TimeCharacteristic.EventTime is set, does this mean a message late by a few days can still enter and be processed?
The main flow diagram does not contain a window function, and basically looks like the following: source(kafka)->filter->processFunction->Sink. Does this mean the the event is fired at the moment it is consumed by Kafka connector?
I currently use Kafka connector 0.10.0, TimeCharacteristic.EventTime set and use a processFunction which every expectedly X minutes does some state cleanup.
However I'm receiving a strange situation where the OnTimerContext contains timestamps which starts from 0 and grows until current timestamp when I start the flink program and is quite strange, is this a bug?
Thanks in advance to all helpers!
That depends on the configuration of the Kafka producer that's creating these events. The message.timestamp.type property should be set to either CreateTime or LogAppendTime.
Your flink application is responsible for creating watermarks; the kafka consumer will take care of the timestamps, but not the watermarks. It doesn't matter how late an event is, it will still enter your pipeline.
Yes.
It's not clear to me what part of this is strange.

activemq with camel: rate limit across all vs across one

I am researching ActiveMQ, in particular, integrating it into my java app using Camel.
Our architecture involves queuing jobs across multiple multithreaded vms. I need in particular two kinds of rate limits:
per vm per time period (all threads)
per all vms per time period
Is there a way to specify these in camel, or are all rate limits implemented on a per-consumer basis?
With the help of throttler, I think you can setup the rate limit per route. As the exchange information is not share across the camel route, I don't think it can work for all vms.
The only way to implement this that meets the requirements (that I can imagine) is to have another process that monitors the queue. At a regular interval, it would send a message to a topic which would say whether we had exceeded the rate limit for that period of time. Any process that subscribes to the queue would have to subscribe to the topic, and when it received a message that the allotted number had been processed it would shut that route down.

Resources