Camel: how to aggregate files based on exchange in pattern - apache-camel

I have a class to run my route; The input comes from a queue (which is filled by a route that does a query and inserts the rows as messages on the queue)
These messages each contain a few headers:
- pdu_id, basically a prefetch on the filename.
- pad: the path the files reside in
What is to happen: I want the files in the path named by their "pdu_id".* in a tar; After that a REST call is to be done to remove the documents source.
I know a route has a from; but basically I need a route with a dynamic "from", and as below code example shows, queueing froms doesn't do the trick.
The question is what to use instead; I could not find a similar thing, but it can be I didn't use the right google search; in which case I'm deeply sorry.
public class ToDeleteTarAndDeleteRoute extends RouteBuilder {
#Override
public void configure() throws Exception
{
from("broker1:todelete.message_ids.queue")
.from("file:///?fileName=${in.header.pad}${in.header.pdu_id}.*")
.aggregate(new TarAggregationStrategy())
.constant(true)
.completionFromBatchConsumer()
.eagerCheckCompletion()
.to("file:///?fileName=${in.header.pad}${in.header.pdu_id}.tar")
.log("${header.pdu_id} tarred")
.setHeader(Exchange.HTTP_METHOD, constant("DELETE"))
.setHeader("Connection", constant("Close"))
.enrich()
.simple("http:127.0.0.1/restfuldb${header.pdu_id}?httpClient.authenticationPreemptive=true")
.log("${header.pdu_id} tarred and deleted.");
}
}

Yes. Poll enrich can help you in doing it. You should use it something like this:
from("broker1:todelete.message_ids.queue")
.aggregationStrategy(new TarAggregationStrategy())
.pollEnrich()
.simple("file:///?fileName=${in.header.pad}/${in.header.pdu_id}.*")
.unmarshal().string()
.to("file:///?fileName=${in.header.pad}/${in.header.pdu_id}.tar")
.log("${header.pdu_id} tarred")
.setHeader(Exchange.HTTP_METHOD, constant("DELETE"))
.setHeader("Connection", constant("Close"))
.enrich()
.simple("http:127.0.0.1/restfuldb${header.pdu_id}?httpClient.authenticationPreemptive=true")
.log("${header.pdu_id} tarred and deleted.");

Currently the solution to the problem consisted of a few changes based on what #daBigBug answered.
pollEnrich's simple expression uses antInclude rather than fileName;
aggregate is put after pollenrich; as each batch is the set of files, rather than the input from the queue. The input from the queue only provides meta information based on which actions are to be taken.
aggregationStrategy() is not possible in a RouteBuilder; I used aggregate() instead.
I removed the unmarshal(); I don't see why this would be needed; the files can contain binary content.
from("broker1:todelete.message_ids.queue")
.pollEnrich()
.simple("file:${in.header.pad}?antInclude=${in.header.pdu_id}.*")
.aggregate(new TarAggregationStrategy())
.constant(true)
.completionFromBatchConsumer()
.eagerCheckCompletion()
.log("tarring to: ${header.pad}${header.pdu_id}.tar")
.setHeader(Exchange.FILE_NAME, simple("${header.pdu_id}.tar"))
.setHeader(Exchange.FILE_PATH, simple("${header.pad}"))
.to("file://ignored")
...(and the rest of the operations);
I now see the files are getting picked up and even placed in a tar; however, the filename of the tar is unexpected as is the location (it's placed in ./ignored); Also in the rest of the operation, it appears the exchange headers are lost.
If anyone can help figure out how to preserve the headers in a safe way... I'm much obliged. Should I use a new question for that, or should I rephrase the question.

Related

Camel: Poll Enrich with Aggregation

From the camel book, section 'Using pollEnrich to merge additional data with an existing message', it shows that you can merge the oldExchange(from the quarz) with the new one (from ftp).
The problem is that I have a file from a topic(old Exchange) and I use pollEnrich to get a new file from a ftp server and I want to merge this too. I am interested in set some headers from oldExchange to the newExchange.
The problem that I am facing is that the oldExchange is all the time null.
I have read the examples from camel book, for aggregator and there said: "The first message arrives for the first group. == null".
I don't understand, then where is my oldExchange? the one from the topic. Why only at the second iteration the exchange is not null (for the same group).
from("myTopic")
.pollEnrich()
.simple("ftp://myUrl&fileName=${in.headers.test}")
.aggregate((Exchange oldExchange, Exchange newExchange) -> {
final String oldHeader = oldExchange.getIn().getHeader("test", String.class);
newExchange.getIn().setHeader("test", oldHeader);
return newExchange;
})
I have read this: http://camel.465427.n5.nabble.com/Split-and-Aggregate-Old-Exchange-is-null-everytime-in-AggregationStrategy-td5746365.html#a5746405 and still I don't understand how can both messages belong to the same group.
The first message arrives for the first group. == null. I don't understand ...
This is true for a standard aggregation where you aggregate for example multiple incoming messages to one. In this case, on the first incoming message the aggregator is still empty and therefore the oldExchange (aggregator content) is null. You have to wait for another (second) message to be able to aggregate something.
However, in your case (enrich) the oldExchange should not be null because the first message, i.e. the message from your topic, is already there.
Have you tried to inspect the message from the topic in the debugger or log it out before it reaches the enricher? Just to make sure it is not empty.
Added after a test
This is fascinating, I tried this with a unit test and when I define the pollEnrich as you do, I get the inverse result: My consumed message routed by .from(...) is the oldExchange and my newExchange is always null.
However, if I define the pollEnrich "inline", it works fine
.pollEnrich("URI", Timeout, (AggregationStrategy))
I suspect that this is explainable if you analyze what the DSL does with these two definitions, but from my quick test perspective it looks a bit strange.
#burki true, is it working as you said with the aggregationStrategy inside the pollEnrich() but I need the simple because I am calling an endpoint dynamically and I cannot do this in the pollEnrich (or at least I don't know how).
I was able to solve like this:
from("myTopic")
.pollEnrich()
.simple("ftp://myUrl&fileName=${in.headers.test}")
.aggregationStrategy((Exchange oldExchange, Exchange newExchange) -> {
final String oldHeader = oldExchange.getIn().getHeader("test", String.class);
newExchange.getIn().setHeader("test", oldHeader);
return newExchange;
})
So instead of the .aggregate call, I am using .aggregationStrategy , what I understood is that the .aggregate call is for the standard aggregation (as #burki mentioned) if we want to aggregate multiple messages and the .aggregationStrategy call can be used to merge 2 messages (one of them is from an external service).

Camel for multiple files processing

I am a new at Camel. I am going to have a file processing with camel but I haven't found a ready solution for my case. I have to process multiple files together in case they exist. These files are uploaded to specific folder with some delays(Example: we have two files A.csv and B.csv, and A.csv is uploaded 10 sec later than B.csv and vice versa). Also if one file is absent more than specific time I need to process only a one file. Could anybody help me with choice a pattern ? As I understand I can use the camel filter to be sure that we already have these two files A.csv and B.csv and only then start processing, but it doesn't resolve my problem.
This is Aggregator EIP.
from("file:inputFolder")
.aggregate(constant(true), AggregationStrategies.groupedExchange())
.completionSize(2) //Wait for two files
.completionTimeout(60000) //Or process single file, if completionSize was not fulfilled within one minute
.to("log:do_something") //Here you can access List<Exchange> from message body
To group messages you can use correlation Expression. For your example (group messages by filename prefix before _) it could be something like this:
private final Expression CORRELATION_EXPRESSION = new Expression() {
#Override
public <T> T evaluate(Exchange exchange, Class<T> type) {
final String fileName = exchange.getIn().getHeader(Exchange.FILE_NAME, String.class);
final String correlationExpression = fileName.substring(0, fileName.indexOf('_'));
return exchange.getContext().getTypeConverter().convertTo(
type,
correlationExpression
);
}
};
And pass it to Aggregator:
from("file:inputDirectory")
.aggregate(CORRELATION_EXPRESSION, AggregationStrategies.groupedExchange())
...
See this gist for full example https://gist.github.com/bedlaj/a2a56aa9291bced8c0a8edebacaf22b0

Using apache camel csv processor with pollEnrich pattern?

Apache Camel 2.12.1
Is it possible to use the Camel CSV component with a pollEnrich? Every example I see is like:
from("file:somefile.csv").marshal...
Whereas I'm using the pollEnrich, like:
pollEnrich("file:somefile.csv", new CSVAggregator())
So within CSVAggregator I have no csv...I just have a file, which I have to do csv processing myself. So is there a way of hooking up the marshalling to the enrich bit somehow...?
EDIT
To make this more general... eg:
from("direct:start")
.to("http:www.blah")
.enrich("file:someFile.csv", new CSVAggregationStrategy) <--how can I call marshal() on this?
...
public class CSVAggregator implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
/* Here I have:
oldExchange = results of http blah endpoint
newExchange = the someFile.csv GenericFile object */
}
Is there any way I can avoid this and use marshal().csv sort of call on the route itself?
Thanks,
Mr Tea
You can use any endpoint in enrich. That includes direct endpoints pointing to other routes. Your example...
Replace this:
from("direct:start")
.to("http:www.blah")
.enrich("file:someFile.csv", new CSVAggregationStrategy)
With this:
from("direct:start")
.to("http:www.blah")
.enrich("direct:readSomeFile", new CSVAggregationStrategy);
from("direct:readSomeFile")
.to("file:someFile.csv")
.unmarshal(myDataFormat);
I ran into the same issue and managed to solve it with the following code (note, I'm using the scala dsl). My use case was slightly different, I wanted to load a CSV file and enrich it with data from an additional static CSV file.
from("direct:start") pollEnrich("file:c:/data/inbox?fileName=vipleaderboard.inclusions.csv&noop=true") unmarshal(csv)
from("file:c:/data/inbox?fileName=vipleaderboard.${date:now:yyyyMMdd}.csv") unmarshal(csv) enrich("direct:start", (current:Exchange, myStatic:Exchange) => {
// both exchange in bodies will contain lists instead of the file handles
})
Here the second route is the one which looks for a file in a specific directory. It unmarshals the CSV data from any matching file it finds and enriches it with the direct route defined in the preceding line. That route is pollEnriching with my static file and as I don't define an aggregation strategy it just replaces the contents of the body with the static file data. I can then unmarshal that from CSV and return the data.
The aggregation function in the second route then has access to both files' CSV data as List<List<String>> instead of just a file.

How can Apache Camel be used to monitor file changes?

I would like to monitor all of the files in a given directory for changes, ie an updated timestamp. This use case seems natural for Camel using the file component, but I can't seem to find a way to configure this behavior.
A uri like:
file:/some/directory
will consume the files in the provided directory but will delete them.
A uri like:
file:/some/directory?noop=true
consumes each file once when it is added or when the route is started.
It's surprising that there isn't an option along the lines of
consumeOnChange=true
Is there a straightforward way to monitor file changes and not delete the file after consuming?
You can do this by setting up the idempotentKey to tell Camel how a file is considered changed. For example if the file size changes, or its timestamp changes etc.
See more details at the Camel file documentation at: https://camel.apache.org/components/latest/file-component.html
See the section Avoiding reading the same file more than once (idempotent consumer). And read about idempotent and idempotentKey.
So something alike
from("file:/somedir?noop=true&idempotentKey=${file:name}-${file:size}")
Or
from("file:/somedir?noop=true&idempotentKey=${file:name}-${file:modified}")
You can read here about the various ${file:xxx} tokens you can use: http://camel.apache.org/file-language.html
Setting noop to true will result in Camel setting idempotent=true as well, despite the fact that idempotent is false by default.
Simplest solution to monitor files would be:
.from("file:path?noop=true&idempotent=false&delay=60s")
This will monitor changes to all files in the given directory every one minute.
This can be found in the Camel documentation at: http://camel.apache.org/file2.html.
I don't think Camel supports that specific feature but with the existent options you can come up with a similar solution of monitoring a directory.
What you need to do is set a small delay value to check the directory and maintain a repository of the already read files. Depending on how you configure the repository (by size, by filename, by a mix of them...) this solution would be able to provide you information about news files and modified files. As a caveat it would be consuming the files in the directory very often.
Maybe you could use other solutions different from Camel like Apache Commons VFS2 (I wrote a explanation about how to use it for this scenario: WatchService locks some files?
I faced the same problem i.e. wanted to copy updated files also (along with new files). Below is my configuration,
public static void main(String[] a) throws Exception {
CamelContext cc = new DefaultCamelContext();
cc.addRoutes(createRouteBuilder());
cc.start();
Thread.sleep(10 * 60 * 1000);
cc.stop();
}
protected static RouteBuilder createRouteBuilder() {
return new RouteBuilder() {
public void configure() {
from("file://D:/Production"
+ "?idempotent=true"
+ "&idempotentKey=${file:name}-${file:size}"
+ "&include=.*.log"
+ "&noop=true"
+ "&readLock=changed")
.to("file://D:/LogRepository");
}
};
}
My testing steps:
Run the program and it copies few .log files from D:/Production to D:/LogRepository and then continues to poll D:/Production directory
I opened a already copied log say A.log from D:/Production (since noop=true nothing is moved) and edited it with some editor tool. This doubled the file size and save it.
At this point I think Camel is supposed to copy that particular file again since its size is modified and in my route definition I used "idempotent=true&idempotentKey=${file:name}-${file:size}&readLock=changed". But camel ignores the file.
When I use TRACE for logging it says "Skipping as file is already in progress...", but I did not find any lock file in D:/Production directory when I editted and saved the file.
I also checked that camel still ignores the file if I replace A.log (with same name but bigger size) in D:/Production directory from outside.
But I found, everything is working as expected if I remove noop=true option.
Am I missing something?
If you want monitor file changes in camel, use file-watch component.
Example -> RECURSIVE WATCH ALL EVENTS (FILE CREATION, FILE DELETION, FILE MODIFICATION):
from("file-watch://some-directory")
.log("File event: ${header.CamelFileEventType} occurred on file ${header.CamelFileName} at ${header.CamelFileLastModified}");
You can see the complete documentation here:
Camel file-watch component

Apache Camel: access both request and reply message at end of route

I would like to process both request and response messages at the end of my route. However, I do not see a way how to access the original request message.
I have the terrible feeling I am struggling with some basic concept.
Here is a simple example route in DSL to outline my problem (streamCaching is enabled for the whole context):
from("activemq:queue:myQueue")
.to("log:" + getClass().getName() + "?showOut=true")
.to("http://localhost:8080/someBackend")
.log("Now in.Body returns this: ${in.body} and out.Body this: ${out.body}")
.to("log:" + getClass().getName() + "?showOut=true");
Here is an according excerpt from my logs (line-breaks edited for better reading). As one can see, the original SOAP message is lost once the http server replied, and the SOAP response object is stored in the inBody of the message.
2012-09-25 17:28:08,312 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut, BodyType:byte[],
Body:<?xml version="1.0" encoding="UTF-8"?><env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header /><env:Body><urn:someRequest xmlns:urn="http://foo.bar.local/ns"></urn:someRequest></env:Body></env:Envelope>,
Out: null]
2012-09-25 17:28:08,398 org.apache.camel.component.http.HttpProducer DEBUG -
Executing http POST method: http://localhost:8080/someBackend
2012-09-25 17:28:09,389 org.apache.camel.component.http.HttpProducer DEBUG -
Http responseCode: 200
2012-09-25 17:28:09,392 route2 INFO -
Now in.Body returns this: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:someResponse xmlns:ns2="http://foo.bar.local/ns"</ns2:someResponse></soap:Body></soap:Envelope>
and out.Body this:
2012-09-25 17:28:09,392 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut,
BodyType:org.apache.camel.converter.stream.InputStreamCache,
Body:[Body is instance of org.apache.camel.StreamCache],
Out: null]
I would have expected to have in.body and out.body be preserved across the whole route?
Alternative solutions I am considering:
Make use of the Correlation Identifier pattern to correlate both request and reply. But would this preserve the message bodies as well? Also, my request/reply messages do not have unique identifiers for correlation.
Write a custom bean, which performs the call to the http backend, processing both request and reply objects (but this is basically a no-Camel solution, reinventing the wheel and hence not preferred)
Already failed approaches:
I tried to access the original request message using a Processor like this at the end of my route, with no success:
process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
Message originalInMessage = exchange.getUnitOfWork().getOriginalInMessage();
logger.debug(originalInMessage.getBody(String.class));
logger.debug(exchange.getIn().getBody(String.class));
}
});
Thanks for any help
Simply store the original body of the in message in a header or a property and retrieve it at the end:
from("activemq:queue:myQueue")
.setProperty("origInBody", body())
.to("http://localhost:8080/someBackend")
After the http call you can then access the property origInBody.
First, this article shows very well how in and out works in camel: http://camel.apache.org/using-getin-or-getout-methods-on-exchange.html
Typically, the out message is not always used, but rather copied from the in-message in each step.
In your case, where you want the original message to stay around til the end of the route, you could go ahead with the Enrichment EIP. http://camel.apache.org/content-enricher.html
Your route would be something like this:
public class MyAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange orig, Exchange httpExchange){
// if you want to check something with the Http request, you better do that here
if( httpExchange is not correct in some way )
throw new RuntimeException("Something went wrong");
return orig;
}
}
AggregationStrategy aggStrategy = new MyAggregationStrategy();
from("activemq:queue:myQueue")
.enrich("http://localhost:8080/someBackend",aggStrategy)
.//keep processing the original request here if you like, in the "in" message
One of the biggest problem of camel, is the ease to misuse it. The best way to use it correctly is to think in terms of EIP : one of the main goals of camel, is to implement EIP in its DSL.
Here is a list of EIP
Now think about it. You want the request and the response at the end, for what use ? Logging, Aggregation, ... ? For logging, a correlationId should suffice, so I presume you need it to create a response, based on both request and the proxied-response. If that's what you want, you could do something like
from("direct:receiveRequest")
.enrich("direct:proxyResponse", new RequestAndResponseAggregationStrategy())
You will have the opportunity to merge your Request (in oldExchange) and your Response (in newExchange).
With all the due respect I have for Christian Schneider, I do think the idea of putting the request in a property that could be reused later is a bad design. By doing it, you create side-effect between your routes. If your route is a subroute for another, you'll maybe erase their property. If you store it to put it back later, maybe you should do something like
from("direct:receiveRequest")
.enrich("direct:subRouteToIgnoreResponse", AggregationStrategies.useOriginal())
A really really bad design that I have done too many time myself is to do :
from("direct:receiveRequest")
.to("direct:subroute")
from("direct:subroute")
.setProperty("originalBody", body())
.to("direct:handling")
.transform(property("originalBody")
This will lead to "properties/headers hell", and to routes that are just a successive call of processors.
And if you can't think of a solution of your problem with EIP, you should maybe use camel only to access their components. For example, something like :
from("http://api.com/services")
.to(new SomeNotTranslatableToEIPProcessor())
.to("ftp://destination")
But don't forget that those components has their own goals : creating a common abstraction between similar behaviour (e.g, time based polling consumer). If you have a very specific need, trying to bend a camel component to this specific need can lead to huge chunk of code not easily maintainable.
Don't let Camel become your Golden Hammer anti-pattern
I often use an aggregation strategy, which preserves the old exchange and puts the result of the enrich into a header:
import org.apache.camel.Exchange;
import org.apache.camel.processor.aggregate.AggregationStrategy;
public class SetBodyToHeaderAggregationStrategy implements AggregationStrategy {
private String headerName = "newBody";
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
oldExchange.getIn().setHeader(headerName, newExchange.getIn().getBody());
return oldExchange;
}
#SuppressWarnings("unused")
public void setHeaderName(String headerName) {
this.headerName = headerName;
}
}
Now you can use it like this:
<enrich strategyRef="setBodyToHeaderAggregationStrategy">
<constant>dosomething</constant>
</enrich>
<log loggingLevel="INFO" message="Result body from enrich: ${header.newBody}. Original body: ${body}" loggerRef="log"/>

Resources