Camel aggregation from two queues in Java DSL - apache-camel

I have two queues which having same type of objects in them. I want to aggregate them into a single queue through java DSL. Could anyone tell me if this is possible? If so, any code references?

If I understand your question correctly, it is possible to do such a thing.
If you need just to drive them into a single route (without any aggregations, enrichments, etc.) you can just proceed with this piece of code:
from('direct:queue1')
.to('direct:start');
from('direct:queue2')
.to('direct:start');
from('direct:start')
//there goes your processing
If you need to aggregate them later on, use Aggregator. Or you can use example from java-addict301's answer if it solves your case.

I believe this may be doable in Camel using the Content Enricher pattern.
Specifically, the following paradigm can be used to retrieve a message from one queue (where direct:start is) and enrich it with a message from the second queue (where direct:resource is). The combined message can then be built in your AggregationStrategy implementation class.
AggregationStrategy aggregationStrategy = ...
from("direct:start")
.enrich("direct:resource", aggregationStrategy)
.to("direct:result");
from("direct:resource")

Related

in-Message copied in out-Message

I have this simple route in my RouteBuilder.
from("amq:MyQueue").routeId(routeId).log(LoggingLevel.DEBUG, "Log: ${in.headers} - ${in.body}")
As stated in the doc for HTTP-component:
Camel will store the HTTP response from the external server on the OUT body. All headers from the IN message will be copied to the OUT message, ...
I would like to know if this concept also applies to amq-component, routeId, and log? Is it the default behaviour, that IN always gets copied to OUT?
Thank you,
Hadi
First of all: The concept of IN and OUT messages is deprecated in Camel 3.x.
This is mentioned in the Camel 3 migration guide and also annotated on the getOut method of the Camel Exchange.
However, it is not (yet) removed, but what you can take from it: don't care about the OUT message. Use the getMessage method and don't use getIn and getOut anymore.
To answer your question:
Yes, most components behave like this
Every step in the route takes the (IN) message and processes it
The body is typically overwritten with the new processing result
The headers typically stay, new headers can be added
So while the Camel Exchange traverses the route, typically the body is continuously updated and the header list grows.
However, some components like aggregator create new messages based on an AggregationStrategy. In such cases nothing is copied automatically and you have to implement the strategy to your needs.

Adding patterns dynamically in Apache Flink without restarting job

My use case is that I want to apply different CEP patterns to the same datastream. the CEP patterns come dynamically & i want them to be added to flink without having to restart the job. While all conditions can be handled via custom classes that implement IterativeCondition, my main problem is that the temporal condition accepts only TimeWindow; which cannot be handled. Is there some way that the value passed to .within() be set based on the input elements?
Something similar was asked here: Flink and Dynamic templates recognition
Best Answer:
"What one could add is a co-flat map operator which receives on one input channel the events and on the other input channel patterns. For each newly received pattern one either updates the existing NFA (this functionality is missing) or compiles a new one. In the latter case, one would apply incoming events to all stored NFAs."
I am trying to implement this but I am facing some difficulty. Specifically, on the point of "In the latter case, one would apply incoming events to all stored NFAs"
Reason being that I apply stream to pattern using: PatternStream matchStream = CEP.pattern(tmatchStream, pattern);
But the stream "tmatchStream" would not be defined inside the co-flatMap. Am I missing something here??? Any help would be greatly appreciated.
Unfortunately the answer to the linked question is still valid. Flink CEP does not support dynamic patterns at that moment. There is already a JIRA ticket for that though: FLINK-7129
The earliest reasonable target version for that feature will be 1.6.0

Camel condition on aggregate of messages

I'm looking for a way to conditionally handle messages based on the aggregation of messages. I've looked into a lot of ways to do this, but it seems that Apache Camel doesn't support it. I'll explain the scenario and then the solutions I tried.
Scenario:
I'm trying to conditionally clean a directory. I poll from the directory every x days and fetch all the files (file://...). I route this into an aggregation, that aggregates the files into a single size (directorySize). I then check if this size passes a certain threshold.
Here is where the problem lies. I now want to remove certain files if this condition passes, but I don't have access to the original messages anymore because they were aggregated in a new exchange.
Solutions:
I tried to fetch the files again to process them. Problem is that you can't make a consumer fetch on demand as far as I know. I tried using pollEnrich, but that will only fetch a single file and not all files in the directory.
I tried to filter/stop the parent route. The problem here is that filter()/choice...stop()/end() will only stop the aggregated route with the directory size and not the parent route with the file messages. I can't conditionally process these.
I tried to move the aggregated condition to another route that I would call, but this causes the same problem as the first solution.
Things I consider doing:
Rewrite the aggregation strategy to not only aggregate the size, but also the files itself into a groupedExchange. This way I can split the aggregation again after the check. I don't really like this solution because it causes a lot boilerplate, both in code as during runtime.
Move the file size calculator to a processor instead of the aggregator. This would defeat the purpose of using camel in the first place.. I would manually be fetching the files and adding the sizes.. And that for every single file..
Use a ControlBus to dynamically start the delete route on that directory. Once again a lot of workaround to achieve something that I feel should be able to be done in a simple route.
I would like to set the calculated size on every parent message, but I have no clue how this could be achieved?
Another way to stop the parent route that I haven't thought of?
I'm a bit stunned that you can't elegantly filter messages based on the aggregation of these messages. Is there something that I missed in Camel that would provide an elegant solution? Or is this a case of the least bad solution?
Simple Schema
Message(File)
Message(File) --> AggregatedMessage(directorySize) --> delete certain Files?
Message(File)
Camel is really awesome, but sometimes it's sure difficult to see exactly which design pattern to use ;)
Firstly, you need to keep a copy of the file objects, because you don't know whether to delete them or not until you reach your threshold - there are basically (at least) two ways to do this.
Alternative 1
The first way is to use a List in an exchange property. This property will hang around no matter what you do with the exchange body. If you have a look at the source code for GroupedExchangeAggregationStrategy, it does precisely this:
list = new ArrayList<Exchange>();
answer.setProperty(Exchange.GROUPED_EXCHANGE, list);
// ...
list.add(newExchange);
Or you could do the same thing manually on your own exchange property. In any case, it's completely fine to use the Grouped aggregation strategy as you have done.
Alternative 2
The second way to "keep" old messages is to send a copy to a stopped SEDA queue. So you would do to("seda:xyz"). You define this queue as .noAutoStartup(). Then you can send messages to it and they will queue up on an internal queue, managed by camel. When you want to process the messages, you simply start it up via controlbus and stop it again afterwards.
Generally, messing around with starting and stopping queues should be avoided unless absolutely necessary, but that's certainly another way to do it
Suggested solution
I suggest you do as you have done (i.e. alternative 1):
aggregate via GroupedExchangeAggregationStrategy to keep the individual files in a list
Compute the total file size (use a processor, or do it along the way with a custom aggregation strategy)
Use a filter(simple("${body} < 123"))
"Unwind" your aggregation via a splitter(simple("${property.CamelGroupedExchange}"))
Delete your files one by one
Please let me know if this doesn'y makes sense, or if I have misunderstood your problem in any way.

Camel: Tracing history of exchanges when a splitter is used

I'm using Apache Camel, and trying to create a log of the history of the processing of each message in a workflow.
For simple straight-through workflows, where a message comes in, is processed by a few steps, and then leaves, this could be as simple as just keeping a sequential log of the exchanges. I can do this by writing a custom TraceEventHandler, which is called at each exchange and allows me to do logging.
However, if a splitter is involved, I don't know how to calculate the provenance of any given exchange. I could maintain my own log of exchanges, but in the case of a splitter, not all previous activity would be an ancestor of the current exchange. That is, if an incoming message is split into part1 and part2, which are then each processed separately, I don't want to consider the processing of part1 when calculating the history of part2.
A TraceEventHandler has this method:
#Override
public void traceExchange(ProcessorDefinition<?> node, Processor target,
TraceInterceptor traceInterceptor,Exchange exchange) throws Exception {
}
and I expected that there would be an Exchange method like Exchange getPreviousExchange() that I could call inside traceExchange, but I can find no such thing.
Any advice? I'm not married to using a custom TraceEventHandler if there's a better way to do this.
Thanks.
You can find the previous Exchange id by looking up the exchange property with the key "CamelCorrelationId".
If you want to track the post-split processing as separate branches, then you need to consider the Camel property "CamelSplitIndex". This property will indicate which iteration of the split you're processing and when combined with the CamelCorrelationId as William suggested, will provide the full picture.

What's the difference between "direct:" and to() in Apache Camel?

The DirectComponent documentation gives the following example:
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Is there any difference between that and the following?
from("activemq:queue:order.in")
.to("bean:orderServer?method=validate")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
I've tried to find documentation on what the behaviour of the to() method is on the Java DSL, but beyond the RouteDefinition javadoc (which gives the very curt "Sends the exchange to the given endpoint") I've come up blank :(
In the very case above, you will not notice much difference. The "direct" component is much like a method call.
Once you start build a bit more complex routes, you will want to segment them in several different parts for multiple reasons.
You can, for instance, create "sub routes" that could be reused among multiple routes in your Camel context. Much like you segment out methods in regular programming to allow reusability and make code more clear. The same goes for sub routes using, for instance the direct component.
The same approach can be extended. Say you want multiple protocols to be used as endpoints to your route. You can use the direct endpoint to create the main route, something like this:
// Three endpoints to one "main" route.
from("activemq:queue:order.in")
.to("direct:processOrder");
from("file:some/file/path")
.to("direct:processOrder");
from("jetty:http://0.0.0.0/order/in")
.to("direct:processOrder");
from("direct:processOrder")
.to("bean:orderService?method=process")
.to("activemq:queue:order.out");
Another thing is that one route is created for each "from()" clause in DSL. A route is an artifact in Camel, and you could do certain administrative tasks towards it with the Camel API, such as start, stop, add, remove routes dynamically. The "to" clause is just an endpoint call.
Once starting to do some real cases with somewhat complexity in Camel, you will note that you cannot get too many "direct" routes.
Direct Component is used to name the logical segment of the route. This is similar process to naming procedures in structural programming.
In your example there is no difference in message flow. In the terms of structural programming, we could say that you make a kind of inline expansion to your route.
Another difference is Direct component doesn't has any thread pool, the direct consumer process method is invoked by the calling thread of direct producer.
Mainly its used for break the complex route configuration like in java we used to have method for reusability. And also by configuring threads at direct route we can reduce the work for calling thread .
from(A).to(B).to(OUT)
is chaining
A --- B --- OUT
But
from(A ).to( X)
from(B ).to( X)
from( X).to( OUT )
where X is a direct:?
is basically like a join
A
\____ OUT
/
B
obviously these are different behaviours, and with the second you could implement anylogic you wanted, not just a serial chain

Resources