Camel: Poll Enrich with Aggregation - apache-camel

From the camel book, section 'Using pollEnrich to merge additional data with an existing message', it shows that you can merge the oldExchange(from the quarz) with the new one (from ftp).
The problem is that I have a file from a topic(old Exchange) and I use pollEnrich to get a new file from a ftp server and I want to merge this too. I am interested in set some headers from oldExchange to the newExchange.
The problem that I am facing is that the oldExchange is all the time null.
I have read the examples from camel book, for aggregator and there said: "The first message arrives for the first group. == null".
I don't understand, then where is my oldExchange? the one from the topic. Why only at the second iteration the exchange is not null (for the same group).
from("myTopic")
.pollEnrich()
.simple("ftp://myUrl&fileName=${in.headers.test}")
.aggregate((Exchange oldExchange, Exchange newExchange) -> {
final String oldHeader = oldExchange.getIn().getHeader("test", String.class);
newExchange.getIn().setHeader("test", oldHeader);
return newExchange;
})
I have read this: http://camel.465427.n5.nabble.com/Split-and-Aggregate-Old-Exchange-is-null-everytime-in-AggregationStrategy-td5746365.html#a5746405 and still I don't understand how can both messages belong to the same group.

The first message arrives for the first group. == null. I don't understand ...
This is true for a standard aggregation where you aggregate for example multiple incoming messages to one. In this case, on the first incoming message the aggregator is still empty and therefore the oldExchange (aggregator content) is null. You have to wait for another (second) message to be able to aggregate something.
However, in your case (enrich) the oldExchange should not be null because the first message, i.e. the message from your topic, is already there.
Have you tried to inspect the message from the topic in the debugger or log it out before it reaches the enricher? Just to make sure it is not empty.
Added after a test
This is fascinating, I tried this with a unit test and when I define the pollEnrich as you do, I get the inverse result: My consumed message routed by .from(...) is the oldExchange and my newExchange is always null.
However, if I define the pollEnrich "inline", it works fine
.pollEnrich("URI", Timeout, (AggregationStrategy))
I suspect that this is explainable if you analyze what the DSL does with these two definitions, but from my quick test perspective it looks a bit strange.

#burki true, is it working as you said with the aggregationStrategy inside the pollEnrich() but I need the simple because I am calling an endpoint dynamically and I cannot do this in the pollEnrich (or at least I don't know how).
I was able to solve like this:
from("myTopic")
.pollEnrich()
.simple("ftp://myUrl&fileName=${in.headers.test}")
.aggregationStrategy((Exchange oldExchange, Exchange newExchange) -> {
final String oldHeader = oldExchange.getIn().getHeader("test", String.class);
newExchange.getIn().setHeader("test", oldHeader);
return newExchange;
})
So instead of the .aggregate call, I am using .aggregationStrategy , what I understood is that the .aggregate call is for the standard aggregation (as #burki mentioned) if we want to aggregate multiple messages and the .aggregationStrategy call can be used to merge 2 messages (one of them is from an external service).

Related

How to manually ack/nack a PubSub message in Camel Route

I am setting up a Camel Route with ackMode=NONE meaning acknowlegements are not done automatically. How do I explicitly acknowledge the message in the route?
In my Camel Route definition I've set ackMode to NONE. According to the documentation, I should be able to manually acknowledge the message downstream:
https://github.com/apache/camel/blob/master/components/camel-google-pubsub/src/main/docs/google-pubsub-component.adoc
"AUTO = exchange gets ack’ed/nack’ed on completion. NONE = downstream process has to ack/nack explicitly"
However I cannot figure out how to send the ack.
from("google-pubsub:<project>:<subscription>?concurrentConsumers=1&maxMessagesPerPoll=1&ackMode=NONE")
.bean("processingBean");
My PubSub subscription has an acknowledgement deadline of 10 seconds and so my message keeps getting re-sent every 10 seconds due to ackMode=NONE. This is as expected. However I cannot find a way to manually acknowledge the message once processing is complete and stop the re-deliveries.
I was able to dig through the Camel components and figure out how it is done. First I created a GooglePubSubConnectionFactory bean:
#Bean
public GooglePubsubConnectionFactory googlePubsubConnectionFactory() {
GooglePubsubConnectionFactory connectionFactory = new GooglePubsubConnectionFactory();
connectionFactory.setCredentialsFileLocation(pubsubKey);
return connectionFactory;
}
Then I was able to reference the ack id of the message from the header:
#Header(GooglePubsubConstants.ACK_ID) String ackId
Then I used the following code to acknowledge the message:
List<String > ackIdList = new ArrayList<>();
ackIdList.add(ackId);
AcknowledgeRequest ackRequest = new AcknowledgeRequest().setAckIds(ackIdList);
Pubsub pubsub = googlePubsubConnectionFactory.getDefaultClient();
pubsub.projects().subscriptions().acknowledge("projects/<my project>/subscriptions/<my subscription>", ackRequest).execute();
I think it is best if you look how the Camel component does it with ackMode=AUTO. Have a look at this class (method acknowledge)
But why do you want to do this extra work? Camel is your fried to simplify integration by abstracting away low level code.
So when you use ackMode=AUTO Camel automatically commits your successfully processed messages (when the message has successfully passed the whole route) and rolls back your not processable messages.

Camel Idempotent Consumer incorrect behaviour for removeOnFailure=true

I would like to know if the below is expected behaviour for Camel idempotent consumer:
I have removeOnFailure=true for the route, which means basically when the exchange fails idempotent consumer should remove the Identifier from the repository. This brings up a very interesting scenario which allows duplicate on the exchange.
Suppose I have identifier=12345 and first attempt to execute the exchange was Succesfull which means identifier is added to idempotent repository. Next attempt to use same identifier i.e 12345 fails as this is caught as Duplicate Message (CamelDuplicateMessage). But at this point having removeOnFailure=true will remove the identifier from the repository which on next attempt will allow the exchange to go through successfully without catching the default message. Hence, creating a room for duplication on the exchange.
Can someone advise if this is expected behaviour or some bug?
Sample Route:
from("direct:Route-DeDupeCheck").routeId("Route-DeDupeCheck")
.log(LoggingLevel.DEBUG, "~~~~~~~ Reached to Route-DeDupeCheck: ${property.xref}")
.idempotentConsumer(simple("${property.xref}"), MemoryIdempotentRepository.memoryIdempotentRepository()) //TODO: To replace with Redis DB for caching
.removeOnFailure(true)
.skipDuplicate(false)
.filter(exchangeProperty(Exchange.DUPLICATE_MESSAGE).isEqualTo(true))
.log("~~~~~~~ Duplicate Message Found!")
.to("amq:queue:{{jms.duplicateQueue}}?exchangePattern=InOnly") //TODO: To send this to Duplicate JMS Queue
.throwException(new AZBizException("409", "Duplicate Message!"));
Your basic premise is wrong.
Next attempt to use same identifier i.e 12345 fails as this is caught
as Duplicate Message (CamelDuplicateMessage)
When there is a duplicated message, it is not considered as a failure. It is just ignored from further processing(unless you have skipDuplicate option set to true).
Hence the scenario what you just explained cannot occur what so ever.
It is very easy to test. Considering you have a route like this,
public void configure() throws Exception {
//getContext().setTracing(true); Use this to enable tracing
from("direct:abc")
.idempotentConsumer(header("myid"),
MemoryIdempotentRepository.memoryIdempotentRepository(200))
.removeOnFailure(true)
.log("Recieved id : ${header.myid}");
}
}
And a Producer like this
#EndpointInject(uri = "direct:abc")
ProducerTemplate producerTemplate;
for(int i=0, i<5,i++) {
producerTemplate.sendBodyAndHeader("somebody","myid", "1");
}
What you see in logs is
INFO 18768 --- [tp1402599109-31] route1 : Recieved id : 1
And just once.

Camel: how to aggregate files based on exchange in pattern

I have a class to run my route; The input comes from a queue (which is filled by a route that does a query and inserts the rows as messages on the queue)
These messages each contain a few headers:
- pdu_id, basically a prefetch on the filename.
- pad: the path the files reside in
What is to happen: I want the files in the path named by their "pdu_id".* in a tar; After that a REST call is to be done to remove the documents source.
I know a route has a from; but basically I need a route with a dynamic "from", and as below code example shows, queueing froms doesn't do the trick.
The question is what to use instead; I could not find a similar thing, but it can be I didn't use the right google search; in which case I'm deeply sorry.
public class ToDeleteTarAndDeleteRoute extends RouteBuilder {
#Override
public void configure() throws Exception
{
from("broker1:todelete.message_ids.queue")
.from("file:///?fileName=${in.header.pad}${in.header.pdu_id}.*")
.aggregate(new TarAggregationStrategy())
.constant(true)
.completionFromBatchConsumer()
.eagerCheckCompletion()
.to("file:///?fileName=${in.header.pad}${in.header.pdu_id}.tar")
.log("${header.pdu_id} tarred")
.setHeader(Exchange.HTTP_METHOD, constant("DELETE"))
.setHeader("Connection", constant("Close"))
.enrich()
.simple("http:127.0.0.1/restfuldb${header.pdu_id}?httpClient.authenticationPreemptive=true")
.log("${header.pdu_id} tarred and deleted.");
}
}
Yes. Poll enrich can help you in doing it. You should use it something like this:
from("broker1:todelete.message_ids.queue")
.aggregationStrategy(new TarAggregationStrategy())
.pollEnrich()
.simple("file:///?fileName=${in.header.pad}/${in.header.pdu_id}.*")
.unmarshal().string()
.to("file:///?fileName=${in.header.pad}/${in.header.pdu_id}.tar")
.log("${header.pdu_id} tarred")
.setHeader(Exchange.HTTP_METHOD, constant("DELETE"))
.setHeader("Connection", constant("Close"))
.enrich()
.simple("http:127.0.0.1/restfuldb${header.pdu_id}?httpClient.authenticationPreemptive=true")
.log("${header.pdu_id} tarred and deleted.");
Currently the solution to the problem consisted of a few changes based on what #daBigBug answered.
pollEnrich's simple expression uses antInclude rather than fileName;
aggregate is put after pollenrich; as each batch is the set of files, rather than the input from the queue. The input from the queue only provides meta information based on which actions are to be taken.
aggregationStrategy() is not possible in a RouteBuilder; I used aggregate() instead.
I removed the unmarshal(); I don't see why this would be needed; the files can contain binary content.
from("broker1:todelete.message_ids.queue")
.pollEnrich()
.simple("file:${in.header.pad}?antInclude=${in.header.pdu_id}.*")
.aggregate(new TarAggregationStrategy())
.constant(true)
.completionFromBatchConsumer()
.eagerCheckCompletion()
.log("tarring to: ${header.pad}${header.pdu_id}.tar")
.setHeader(Exchange.FILE_NAME, simple("${header.pdu_id}.tar"))
.setHeader(Exchange.FILE_PATH, simple("${header.pad}"))
.to("file://ignored")
...(and the rest of the operations);
I now see the files are getting picked up and even placed in a tar; however, the filename of the tar is unexpected as is the location (it's placed in ./ignored); Also in the rest of the operation, it appears the exchange headers are lost.
If anyone can help figure out how to preserve the headers in a safe way... I'm much obliged. Should I use a new question for that, or should I rephrase the question.

Apache Camel splitter with hazelcast seda queue

I'm trying to do a file import process where a file is picked up in a subdirectory of a given folder, the subdirectory identifying the client the file is for, then the records are parsed, split, and sent on Hazelcast SEDA queues. I want to process each record as its read off of the Hazelcast SEDA queue, then it returns a status code (created, updated, or errored) which can be aggregated.
I'm also creating a job record when the file is first picked up and I want to update the job record with the final count of created, updated, and errors.
The JobProcessor below creates this record and sets the client Organization and Job objects in headers on the message. The CensusExcelDataFormat reads an Excel file and creates an Employee object for each line, then returns a Collection.
from("file:" + censusDirectory + "?recursive=true").idempotentConsumer(new SimpleExpression("file:name"), idempotentRepository)
.process(new JobProcessor(organizationService, jobService, Job.JobType.CENSUS))
.unmarshal(censusExcelDataFormat)
.split(body(), new ListAggregationStrategy()).parallelProcessing()
.to(ExchangePattern.InOut, "hazelcast:seda:process-employee-import").end()
.process(new JobCompletionProcessor(jobService))
.end();
from("hazelcast:seda:process-employee-import")
.idempotentConsumer(simple("${body.entityId}"), idempotentRepository)
.bean(employeeImporterJob, "importOrUpdate");
The problem I'm having is that the list aggregation happens immediately and instead of getting a list of statuses I'm getting the same list of Employee objects. I want the Employee objects to be sent on the SEDA queue and the return value from the processing on the queue to be aggregated then run through the JobCompletionProcessor to update the Job record.
The behaviour is you are seeing is the default behavior. The apache camel splitter documentation clearly states this in the what the splitter returns section.
Camel 2.2 or older: The Splitter will by default return the last
splitted message.
Camel 2.3 and newer: The Splitter will by default return the
original input message.
For all versions: You can override this by supplying your own
strategy as an AggregationStrategy. There is a sample on this page
(Split aggregate request/reply sample). Notice it's the same
strategy as the Aggregator supports. This Splitter can be viewed as
having a build in light weight Aggregator.
So as you can see you are required to implement your own splitter aggregation strategy. To do this create a new class that implements AggrgationStrategy something like the code below:
public class MyAggregationStrategy implements AggregationStrategy
{
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if (oldExchange == null) //this would be null on the first exchange.
{
//do some work on the first time if needed
}
/*
Here you put your code to calculate failed, updated, created.
*/
}
}
You can then use your custom aggregation strategy by specifying it like the following examples:
.split(body(), new MyAggregationStrategy()) //Java DSL
<split strategyRef="myAggregationStrategy"/> //XML Blueprint

Apache Camel: access both request and reply message at end of route

I would like to process both request and response messages at the end of my route. However, I do not see a way how to access the original request message.
I have the terrible feeling I am struggling with some basic concept.
Here is a simple example route in DSL to outline my problem (streamCaching is enabled for the whole context):
from("activemq:queue:myQueue")
.to("log:" + getClass().getName() + "?showOut=true")
.to("http://localhost:8080/someBackend")
.log("Now in.Body returns this: ${in.body} and out.Body this: ${out.body}")
.to("log:" + getClass().getName() + "?showOut=true");
Here is an according excerpt from my logs (line-breaks edited for better reading). As one can see, the original SOAP message is lost once the http server replied, and the SOAP response object is stored in the inBody of the message.
2012-09-25 17:28:08,312 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut, BodyType:byte[],
Body:<?xml version="1.0" encoding="UTF-8"?><env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header /><env:Body><urn:someRequest xmlns:urn="http://foo.bar.local/ns"></urn:someRequest></env:Body></env:Envelope>,
Out: null]
2012-09-25 17:28:08,398 org.apache.camel.component.http.HttpProducer DEBUG -
Executing http POST method: http://localhost:8080/someBackend
2012-09-25 17:28:09,389 org.apache.camel.component.http.HttpProducer DEBUG -
Http responseCode: 200
2012-09-25 17:28:09,392 route2 INFO -
Now in.Body returns this: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:someResponse xmlns:ns2="http://foo.bar.local/ns"</ns2:someResponse></soap:Body></soap:Envelope>
and out.Body this:
2012-09-25 17:28:09,392 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut,
BodyType:org.apache.camel.converter.stream.InputStreamCache,
Body:[Body is instance of org.apache.camel.StreamCache],
Out: null]
I would have expected to have in.body and out.body be preserved across the whole route?
Alternative solutions I am considering:
Make use of the Correlation Identifier pattern to correlate both request and reply. But would this preserve the message bodies as well? Also, my request/reply messages do not have unique identifiers for correlation.
Write a custom bean, which performs the call to the http backend, processing both request and reply objects (but this is basically a no-Camel solution, reinventing the wheel and hence not preferred)
Already failed approaches:
I tried to access the original request message using a Processor like this at the end of my route, with no success:
process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
Message originalInMessage = exchange.getUnitOfWork().getOriginalInMessage();
logger.debug(originalInMessage.getBody(String.class));
logger.debug(exchange.getIn().getBody(String.class));
}
});
Thanks for any help
Simply store the original body of the in message in a header or a property and retrieve it at the end:
from("activemq:queue:myQueue")
.setProperty("origInBody", body())
.to("http://localhost:8080/someBackend")
After the http call you can then access the property origInBody.
First, this article shows very well how in and out works in camel: http://camel.apache.org/using-getin-or-getout-methods-on-exchange.html
Typically, the out message is not always used, but rather copied from the in-message in each step.
In your case, where you want the original message to stay around til the end of the route, you could go ahead with the Enrichment EIP. http://camel.apache.org/content-enricher.html
Your route would be something like this:
public class MyAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange orig, Exchange httpExchange){
// if you want to check something with the Http request, you better do that here
if( httpExchange is not correct in some way )
throw new RuntimeException("Something went wrong");
return orig;
}
}
AggregationStrategy aggStrategy = new MyAggregationStrategy();
from("activemq:queue:myQueue")
.enrich("http://localhost:8080/someBackend",aggStrategy)
.//keep processing the original request here if you like, in the "in" message
One of the biggest problem of camel, is the ease to misuse it. The best way to use it correctly is to think in terms of EIP : one of the main goals of camel, is to implement EIP in its DSL.
Here is a list of EIP
Now think about it. You want the request and the response at the end, for what use ? Logging, Aggregation, ... ? For logging, a correlationId should suffice, so I presume you need it to create a response, based on both request and the proxied-response. If that's what you want, you could do something like
from("direct:receiveRequest")
.enrich("direct:proxyResponse", new RequestAndResponseAggregationStrategy())
You will have the opportunity to merge your Request (in oldExchange) and your Response (in newExchange).
With all the due respect I have for Christian Schneider, I do think the idea of putting the request in a property that could be reused later is a bad design. By doing it, you create side-effect between your routes. If your route is a subroute for another, you'll maybe erase their property. If you store it to put it back later, maybe you should do something like
from("direct:receiveRequest")
.enrich("direct:subRouteToIgnoreResponse", AggregationStrategies.useOriginal())
A really really bad design that I have done too many time myself is to do :
from("direct:receiveRequest")
.to("direct:subroute")
from("direct:subroute")
.setProperty("originalBody", body())
.to("direct:handling")
.transform(property("originalBody")
This will lead to "properties/headers hell", and to routes that are just a successive call of processors.
And if you can't think of a solution of your problem with EIP, you should maybe use camel only to access their components. For example, something like :
from("http://api.com/services")
.to(new SomeNotTranslatableToEIPProcessor())
.to("ftp://destination")
But don't forget that those components has their own goals : creating a common abstraction between similar behaviour (e.g, time based polling consumer). If you have a very specific need, trying to bend a camel component to this specific need can lead to huge chunk of code not easily maintainable.
Don't let Camel become your Golden Hammer anti-pattern
I often use an aggregation strategy, which preserves the old exchange and puts the result of the enrich into a header:
import org.apache.camel.Exchange;
import org.apache.camel.processor.aggregate.AggregationStrategy;
public class SetBodyToHeaderAggregationStrategy implements AggregationStrategy {
private String headerName = "newBody";
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
oldExchange.getIn().setHeader(headerName, newExchange.getIn().getBody());
return oldExchange;
}
#SuppressWarnings("unused")
public void setHeaderName(String headerName) {
this.headerName = headerName;
}
}
Now you can use it like this:
<enrich strategyRef="setBodyToHeaderAggregationStrategy">
<constant>dosomething</constant>
</enrich>
<log loggingLevel="INFO" message="Result body from enrich: ${header.newBody}. Original body: ${body}" loggerRef="log"/>

Resources