How to perform Camel routes with different exchanges synchronously? - apache-camel

I have some synchronous Camel routes with:
from("file:...")
...
.to("direct:next1")
from("direct:next1")
...
Now I'd like to run another route with a different exchange synchronously:
from("file:local/A")
...
.to("file:remote/A")
.to("direct:next2")
from("file:remote/A") // direct:next2 ?
...
How can I achieve this?

First off - the elephant in the room. Both of your file routes are going to start and run asynchronously by default because there is no dependency between those two routes at an EIP level to tell one to fire after the other.
There are two ways to solve this (e.g. to force an interdependency between the routes):
Require local/A to start the remote/A route (and to not attempt to restart the route if it's running.
Allow remote/A to poll for the file at given intervals.
Chain-starting the routes is a more sophisticated function of Camel and it allows you to precisely define the lifecycle of routes at your leisure. In this case, after your local/A file route finishes consuming, it would start up the remote/A route.
For that we can leverage the Control Bus EIP which allows us to control the lifecycle of routes directly.
from("file:local/A")
.routeId("localA")
.process(aProcessor)
.to("controlbus:route?routeId=remoteA&action=start");
from("file:remote/A")
.routeId("remoteA")
.autoStartup(false)
.process(aDifferentProcessor)
.to("controlbus:route?routeId=remoteA&action=suspend&async=true");
Polling is probably the most straightforward approach and it doesn't require much in the way of finickiness. Either the file exists or it doesn't when you go to process it, and you can choose to wait for a specific period of time when that file appears again.
from("file:local/A")
.routeId("localA")
.process(aProcessor);
from("file:remote/A?delay=2m")
.routeId("remoteA")
.process(aDifferentProcessor);
}
My preference would be to poll at a fixed interval or on a cron schedule, simply because it doesn't feel like you'd be gaining true synchronicity with either approach. You get the power to decide what fires when, but it's not 100% synchronous like direct routes.

Related

Multiple consumers for the same endpoint is not allowed

I would like to read files from a directory with camel file consumer but I need my route to be transacted. So I can not use threads inside the rout.
Is it ok to write multiply routes to read from the same endpoint (same directory) with a little change between the uris (for example the sort type) , and like this to avoid the Multiple consumers for the same endpoint is not allowed exception ?
Yeah sure you can do that, mind that you will have competing consumes for the same files now, so mind about read-locks. By default Camel use the marker file.
You can also use different delay so they dont poll at the same interval/time. And you can sort by random to make less chance of processing the same files.

Is there a way to use global property in apache camel

I have a scheduler that puts some value(N or Y) into a topic for every 10 mins(usually 'N', unless something abnormal happens with topic). When the topic goes down, the scheduler will populate a property(kind of inter-scheduler communication), so that it can be used during scheduler's next cycle, as way of telling the scheduler that something bad happened during last cycle, so that, it'll place a different value('Y') in topic in this cycle. But the problem here is normal exchange property isn't helping. The property is always null during every scheduler cycle.
When i went through the http://camel.apache.org/schema/blueprint/camel-blueprint.xsd, looking out for something similar to global properties, i got this one "tns:properties"
which can be set at context level.
Can this be used as a global property?
is there a way to read/write it in my scheduler route?
I'm also thinking about having a bean with an instance variable to hold this inter-scheduler-communication property.
Can anyone suggest the right option?
What you've described sounds to me like a means for maintaining state between processes, and using the properties for that will be problematic for a number of reasons.
I suggest breaking the app into a couple different pieces, and use a shared OSGi service to maintain the state.
public interface MyScheduleState() {
public setSomeValue(String x)
public String getSomeValue()
}
Route 1: Timer starts the task .. check the service for values.. send event. if error occurs, sends error message to some queue://MY.ERRORS
Route 2: Listen for errors on MY.ERRORS and update the OSGi service with new values
This gives you control over the behavior and you can change how the "stateful service" stores its data.. either in memory, on disk as a file or in a cache" and your routes will never know the specifics.
Take a look to http://camel.apache.org/properties.html
It seems to be exactly you are looking for - context properties. You can set a property value on each cycle and it will be available in the next cycle too.

Camel condition on aggregate of messages

I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)
Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.

Camel # route steps vs memory/performance

It might be a silly question, but say I have a hughe message that I want to process with Camel. How will the number of steps in my route affect the memory usage? Does camel deep copy my message payload for every step in the route, even if the DSL-step only reads from the message or does it do something smart here?
Is it better to keep the route down and do things in a "hughe" bean for large messages or not?
This is an example route that does various things, but not changing the payload.
from("foo:bar")
.log(..)
.setProperty(..)
.setHeader(..)
.log(..)
.choice()
.when(simple(... ) )
.log(..)
.to(..)
.when(simple(..))
.log(..)
.to(..)
.end()
from my understanding, for a simple pipelined route like this, an Exchange is created containing the body once and passed along each step in the route. Other EIPs do cause the Exchange to be copied though (like multicast, wiretap, etc)...
as well, if you have steps along the route which interface with external resources which could result in any type of copy/clone/conversion/serialization of the body unnecessarily, then you might use something like the claim check pattern to reduce this.
The camel exchange is the same through the route the message objects are copied or recereated in the steps. The body is just referenced though. So normally you should not have a problem.
This is handled by each camel processor individually though. So some of the processors may copy the body. Typically this is the case when the processor really works on the body. So in this case it can not be avoided.

What's the difference between "direct:" and to() in Apache Camel?

The DirectComponent documentation gives the following example:
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Is there any difference between that and the following?
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
I've tried to find documentation on what the behaviour of the to() method is on the Java DSL, but beyond the RouteDefinition javadoc (which gives the very curt "Sends the exchange to the given endpoint") I've come up blank :(
In the very case above, you will not notice much difference. The "direct" component is much like a method call.
Once you start build a bit more complex routes, you will want to segment them in several different parts for multiple reasons.
You can, for instance, create "sub routes" that could be reused among multiple routes in your Camel context. Much like you segment out methods in regular programming to allow reusability and make code more clear. The same goes for sub routes using, for instance the direct component.
The same approach can be extended. Say you want multiple protocols to be used as endpoints to your route. You can use the direct endpoint to create the main route, something like this:
// Three endpoints to one "main" route.
from("activemq:queue:order.in")
.to("direct:processOrder");
from("file:some/file/path")
.to("direct:processOrder");
from("jetty:http://0.0.0.0/order/in")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Another thing is that one route is created for each "from()" clause in DSL. A route is an artifact in Camel, and you could do certain administrative tasks towards it with the Camel API, such as start, stop, add, remove routes dynamically. The "to" clause is just an endpoint call.
Once starting to do some real cases with somewhat complexity in Camel, you will note that you cannot get too many "direct" routes.
Direct Component is used to name the logical segment of the route. This is similar process to naming procedures in structural programming.
In your example there is no difference in message flow. In the terms of structural programming, we could say that you make a kind of inline expansion to your route.
Another difference is Direct component doesn't has any thread pool, the direct consumer process method is invoked by the calling thread of direct producer.
Mainly its used for break the complex route configuration like in java we used to have method for reusability. And also by configuring threads at direct route we can reduce the work for calling thread .
from(A).to(B).to(OUT)
is chaining
A --- B --- OUT
But
from(A ).to( X)
from(B ).to( X)
from( X).to( OUT )
where X is a direct:?
is basically like a join
A
\____ OUT
/
B
obviously these are different behaviours, and with the second you could implement anylogic you wanted, not just a serial chain

Resources