Camel # route steps vs memory/performance - apache-camel

It might be a silly question, but say I have a hughe message that I want to process with Camel. How will the number of steps in my route affect the memory usage? Does camel deep copy my message payload for every step in the route, even if the DSL-step only reads from the message or does it do something smart here?
Is it better to keep the route down and do things in a "hughe" bean for large messages or not?
This is an example route that does various things, but not changing the payload.
from("foo:bar")
.log(..)
.setProperty(..)
.setHeader(..)
.log(..)
.choice()
.when(simple(... ) )
.log(..)
.to(..)
.when(simple(..))
.log(..)
.to(..)
.end()

from my understanding, for a simple pipelined route like this, an Exchange is created containing the body once and passed along each step in the route. Other EIPs do cause the Exchange to be copied though (like multicast, wiretap, etc)...
as well, if you have steps along the route which interface with external resources which could result in any type of copy/clone/conversion/serialization of the body unnecessarily, then you might use something like the claim check pattern to reduce this.

The camel exchange is the same through the route the message objects are copied or recereated in the steps. The body is just referenced though. So normally you should not have a problem.
This is handled by each camel processor individually though. So some of the processors may copy the body. Typically this is the case when the processor really works on the body. So in this case it can not be avoided.

Related

in-Message copied in out-Message

I have this simple route in my RouteBuilder.
from("amq:MyQueue").routeId(routeId).log(LoggingLevel.DEBUG, "Log: ${in.headers} - ${in.body}")
As stated in the doc for HTTP-component:
Camel will store the HTTP response from the external server on the OUT body. All headers from the IN message will be copied to the OUT message, ...
I would like to know if this concept also applies to amq-component, routeId, and log? Is it the default behaviour, that IN always gets copied to OUT?
Thank you,
Hadi
First of all: The concept of IN and OUT messages is deprecated in Camel 3.x.
This is mentioned in the Camel 3 migration guide and also annotated on the getOut method of the Camel Exchange.
However, it is not (yet) removed, but what you can take from it: don't care about the OUT message. Use the getMessage method and don't use getIn and getOut anymore.
To answer your question:
Yes, most components behave like this
Every step in the route takes the (IN) message and processes it
The body is typically overwritten with the new processing result
The headers typically stay, new headers can be added
So while the Camel Exchange traverses the route, typically the body is continuously updated and the header list grows.
However, some components like aggregator create new messages based on an AggregationStrategy. In such cases nothing is copied automatically and you have to implement the strategy to your needs.

How to perform Camel routes with different exchanges synchronously?

I have some synchronous Camel routes with:
from("file:...")
...
.to("direct:next1")
from("direct:next1")
...
Now I'd like to run another route with a different exchange synchronously:
from("file:local/A")
...
.to("file:remote/A")
.to("direct:next2")
from("file:remote/A") // direct:next2 ?
...
How can I achieve this?
First off - the elephant in the room. Both of your file routes are going to start and run asynchronously by default because there is no dependency between those two routes at an EIP level to tell one to fire after the other.
There are two ways to solve this (e.g. to force an interdependency between the routes):
Require local/A to start the remote/A route (and to not attempt to restart the route if it's running.
Allow remote/A to poll for the file at given intervals.
Chain-starting the routes is a more sophisticated function of Camel and it allows you to precisely define the lifecycle of routes at your leisure. In this case, after your local/A file route finishes consuming, it would start up the remote/A route.
For that we can leverage the Control Bus EIP which allows us to control the lifecycle of routes directly.
from("file:local/A")
.routeId("localA")
.process(aProcessor)
.to("controlbus:route?routeId=remoteA&action=start");
from("file:remote/A")
.routeId("remoteA")
.autoStartup(false)
.process(aDifferentProcessor)
.to("controlbus:route?routeId=remoteA&action=suspend&async=true");
Polling is probably the most straightforward approach and it doesn't require much in the way of finickiness. Either the file exists or it doesn't when you go to process it, and you can choose to wait for a specific period of time when that file appears again.
from("file:local/A")
.routeId("localA")
.process(aProcessor);
from("file:remote/A?delay=2m")
.routeId("remoteA")
.process(aDifferentProcessor);
}
My preference would be to poll at a fixed interval or on a cron schedule, simply because it doesn't feel like you'd be gaining true synchronicity with either approach. You get the power to decide what fires when, but it's not 100% synchronous like direct routes.

Camel condition on aggregate of messages

I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)
Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.

Handling incoming JMSCorrelationId in an Apache Camel route

I have an camel route consuming on a JMS (activemq) queue targeted to be called in a request/reply manner. Inside this route I split the message and invoke another activemq queue (also in a request/reply manner).
Heres a minimal route showing the situation
<route>
<from uri="activemq:A" />
<split>
<xpath>/root/subpart</xpath>
<inOut uri="activemq:B" />
</split>
</route>
The problem is that Camel does not set a new JMSCorrelationId (since there is already one from the incoming message). If nothing is done, you get responses with unknown correlationId's and the exchanges never end.
I didn't go into details but my guess is that the same temporaryQueue is used for the hole splitter but that it (logically) expects different correlation id's for each of the messages. All using the same, it recieves the first and does not know what to do with the others.
What would be the best solution to handle the situation ?
The one I've found working is to save in another header the incoming JMSCorrelationId (not sure I need to though), and removing it. This is not really as clean as I would want it to be, but I couldn't think of something else. Any ideas ?
Essentially, your case is described in this Jira issue It seems there will be an addition in 2.11 where you can ask Camel to create a new corr-id.
So, in the meantime, why don't you continue what you had working - to remove the JMSCorrelationId header <removeHeader headerName="JMSCorrelationId" /> before you send it to "activemq:B"? I guess that is the best solution for now.
You could, of course, play with the "useMessageIDAsCorrelationID" option as well on the second endpoint.

What's the difference between "direct:" and to() in Apache Camel?

The DirectComponent documentation gives the following example:
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Is there any difference between that and the following?
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
I've tried to find documentation on what the behaviour of the to() method is on the Java DSL, but beyond the RouteDefinition javadoc (which gives the very curt "Sends the exchange to the given endpoint") I've come up blank :(
In the very case above, you will not notice much difference. The "direct" component is much like a method call.
Once you start build a bit more complex routes, you will want to segment them in several different parts for multiple reasons.
You can, for instance, create "sub routes" that could be reused among multiple routes in your Camel context. Much like you segment out methods in regular programming to allow reusability and make code more clear. The same goes for sub routes using, for instance the direct component.
The same approach can be extended. Say you want multiple protocols to be used as endpoints to your route. You can use the direct endpoint to create the main route, something like this:
// Three endpoints to one "main" route.
from("activemq:queue:order.in")
.to("direct:processOrder");
from("file:some/file/path")
.to("direct:processOrder");
from("jetty:http://0.0.0.0/order/in")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Another thing is that one route is created for each "from()" clause in DSL. A route is an artifact in Camel, and you could do certain administrative tasks towards it with the Camel API, such as start, stop, add, remove routes dynamically. The "to" clause is just an endpoint call.
Once starting to do some real cases with somewhat complexity in Camel, you will note that you cannot get too many "direct" routes.
Direct Component is used to name the logical segment of the route. This is similar process to naming procedures in structural programming.
In your example there is no difference in message flow. In the terms of structural programming, we could say that you make a kind of inline expansion to your route.
Another difference is Direct component doesn't has any thread pool, the direct consumer process method is invoked by the calling thread of direct producer.
Mainly its used for break the complex route configuration like in java we used to have method for reusability. And also by configuring threads at direct route we can reduce the work for calling thread .
from(A).to(B).to(OUT)
is chaining
A --- B --- OUT
But
from(A ).to( X)
from(B ).to( X)
from( X).to( OUT )
where X is a direct:?
is basically like a join
A
\____ OUT
/
B
obviously these are different behaviours, and with the second you could implement anylogic you wanted, not just a serial chain

Resources