We are using apache Camel for routing and extracting from the files.
I have a situation where I need to get the data from a file on a shared folder and data from database. I need to combine the data only when data from both sides have arrived. If either side has not received then my data combine process should wait till both sides are present.
Is it possible to possible? How can I achieve that? Any sample code?
Something must trigger the process - either the file or the database so pick one.
Then you can use the enricher pattern to populate the other source (when data is ready). An aggregation strategy is used to combine the data. You typically write the aggregation strategy in java.
The link has examples how to enrich and merge data. You can find out how to handle database and files in the Camel documentation.
I use this for zip processed file with processing log together. I attached an example, hope it will help you.
//Archived
from("direct:" + EnvironmentSetup.ARCHIVED)
.routeId(ROUTES.ARCHIVED.name())
.setHeader(HEADER_ZIP_AGG_ID, header(Exchange.FILE_NAME))
.setHeader(HEADER_AFTER_ZIP_DEST).constant(getArchiveUri())
.setHeader(HEADER_STATUS).constant(STATUS.SUCCESS)
.pipeline()
.to("direct:" + EnvironmentSetup.ARCHIVED_ZIP)
.end()
.pipeline()
.setHeader(Exchange.FILE_NAME, header(Exchange.FILE_NAME).append(".report"))
.setBody(header(ProcessManager.PROCESS_LOG).convertToString())
.to("direct:" + EnvironmentSetup.ARCHIVED_ZIP)
.end()
.end();
from(
"direct:" + EnvironmentSetup.DECRYPT_FAILED_ZIP,
"direct:"+EnvironmentSetup.PROCESS_FAILED_ZIP,
"direct:"+EnvironmentSetup.ARCHIVED_ZIP
)
.routeId("ZIP")
.aggregate(header(HEADER_ZIP_AGG_ID), new CopiedGroupedExchangeAggregationStrategy())
.completionSize(2)
.marshal(zipFileDataFormat)
.multicast()
.pipeline()
.setHeader(Exchange.FILE_NAME, simple(String.format(
"${in.header.%s}/${in.header.%s}", HEADER_EMAIL, Exchange.FILE_NAME))) //header(HEADER_EMAIL). header(Exchange.FILE_NAME))
//.dynamicRouter(header(HEADER_AFTER_ZIP_DEST))
.to("direct:dynamic")
.end()
.pipeline()
.marshal(encryption)
.setHeader(Exchange.FILE_NAME, simple(String.format(
"${in.header.%s}/${in.header.%s}.gpg", HEADER_EMAIL, Exchange.FILE_NAME)))
//.setHeader(Exchange.FILE_NAME, header(Exchange.FILE_NAME).append(".gpg"))
.to("direct:"+EnvironmentSetup.SEND_BACK)
.end()
.end() //end aggregate
.end();
CopiedGroupedExchangeAggregationStrategy.java
public class CopiedGroupedExchangeAggregationStrategy extends
AbstractListAggregationStrategy<Exchange> {
#Override
public boolean isStoreAsBodyOnCompletion() {
// keep the list as a property to be compatible with old behavior
return true;
}
#Override
public Exchange getValue(Exchange exchange) {
return exchange.copy();
}
}
Related
I am trying to setup a simple camel route which reads from a sqlite table and prints the record (later it would be written to a file).
The flow I have setup is below
bindToRegistry("sqlConsumer", new SqliteConsumer());
bindToRegistry("sqliteDatasource", dataSource());
from("sql:select * from recordsheet_record_1 where col_1 = 'A5'?dataSource=#sqliteDatasource")
.to("bean:sqlConsumer?method=consume")
.end();
And the SqliteConsmer as below
public class SqliteConsumer {
public void consume(Map<String, Object> data, Exchange exchange) {
System.out.println("Map: '" + data + "'");
//TODO: append to file
}
}
When I execute the route, it should only execute once (prints once), but, it keeps on printing... Am I doing anything wrong here?
I am new to camel framework so any help or guide would be much appreciated.
Thanks.
It is a polling consumer so it polls the source according to the configuration, you can find more info here: https://camel.apache.org/components/latest/eips/polling-consumer.html
I have a class to run my route; The input comes from a queue (which is filled by a route that does a query and inserts the rows as messages on the queue)
These messages each contain a few headers:
- pdu_id, basically a prefetch on the filename.
- pad: the path the files reside in
What is to happen: I want the files in the path named by their "pdu_id".* in a tar; After that a REST call is to be done to remove the documents source.
I know a route has a from; but basically I need a route with a dynamic "from", and as below code example shows, queueing froms doesn't do the trick.
The question is what to use instead; I could not find a similar thing, but it can be I didn't use the right google search; in which case I'm deeply sorry.
public class ToDeleteTarAndDeleteRoute extends RouteBuilder {
#Override
public void configure() throws Exception
{
from("broker1:todelete.message_ids.queue")
.from("file:///?fileName=${in.header.pad}${in.header.pdu_id}.*")
.aggregate(new TarAggregationStrategy())
.constant(true)
.completionFromBatchConsumer()
.eagerCheckCompletion()
.to("file:///?fileName=${in.header.pad}${in.header.pdu_id}.tar")
.log("${header.pdu_id} tarred")
.setHeader(Exchange.HTTP_METHOD, constant("DELETE"))
.setHeader("Connection", constant("Close"))
.enrich()
.simple("http:127.0.0.1/restfuldb${header.pdu_id}?httpClient.authenticationPreemptive=true")
.log("${header.pdu_id} tarred and deleted.");
}
}
Yes. Poll enrich can help you in doing it. You should use it something like this:
from("broker1:todelete.message_ids.queue")
.aggregationStrategy(new TarAggregationStrategy())
.pollEnrich()
.simple("file:///?fileName=${in.header.pad}/${in.header.pdu_id}.*")
.unmarshal().string()
.to("file:///?fileName=${in.header.pad}/${in.header.pdu_id}.tar")
.log("${header.pdu_id} tarred")
.setHeader(Exchange.HTTP_METHOD, constant("DELETE"))
.setHeader("Connection", constant("Close"))
.enrich()
.simple("http:127.0.0.1/restfuldb${header.pdu_id}?httpClient.authenticationPreemptive=true")
.log("${header.pdu_id} tarred and deleted.");
Currently the solution to the problem consisted of a few changes based on what #daBigBug answered.
pollEnrich's simple expression uses antInclude rather than fileName;
aggregate is put after pollenrich; as each batch is the set of files, rather than the input from the queue. The input from the queue only provides meta information based on which actions are to be taken.
aggregationStrategy() is not possible in a RouteBuilder; I used aggregate() instead.
I removed the unmarshal(); I don't see why this would be needed; the files can contain binary content.
from("broker1:todelete.message_ids.queue")
.pollEnrich()
.simple("file:${in.header.pad}?antInclude=${in.header.pdu_id}.*")
.aggregate(new TarAggregationStrategy())
.constant(true)
.completionFromBatchConsumer()
.eagerCheckCompletion()
.log("tarring to: ${header.pad}${header.pdu_id}.tar")
.setHeader(Exchange.FILE_NAME, simple("${header.pdu_id}.tar"))
.setHeader(Exchange.FILE_PATH, simple("${header.pad}"))
.to("file://ignored")
...(and the rest of the operations);
I now see the files are getting picked up and even placed in a tar; however, the filename of the tar is unexpected as is the location (it's placed in ./ignored); Also in the rest of the operation, it appears the exchange headers are lost.
If anyone can help figure out how to preserve the headers in a safe way... I'm much obliged. Should I use a new question for that, or should I rephrase the question.
I'm trying to do a file import process where a file is picked up in a subdirectory of a given folder, the subdirectory identifying the client the file is for, then the records are parsed, split, and sent on Hazelcast SEDA queues. I want to process each record as its read off of the Hazelcast SEDA queue, then it returns a status code (created, updated, or errored) which can be aggregated.
I'm also creating a job record when the file is first picked up and I want to update the job record with the final count of created, updated, and errors.
The JobProcessor below creates this record and sets the client Organization and Job objects in headers on the message. The CensusExcelDataFormat reads an Excel file and creates an Employee object for each line, then returns a Collection.
from("file:" + censusDirectory + "?recursive=true").idempotentConsumer(new SimpleExpression("file:name"), idempotentRepository)
.process(new JobProcessor(organizationService, jobService, Job.JobType.CENSUS))
.unmarshal(censusExcelDataFormat)
.split(body(), new ListAggregationStrategy()).parallelProcessing()
.to(ExchangePattern.InOut, "hazelcast:seda:process-employee-import").end()
.process(new JobCompletionProcessor(jobService))
.end();
from("hazelcast:seda:process-employee-import")
.idempotentConsumer(simple("${body.entityId}"), idempotentRepository)
.bean(employeeImporterJob, "importOrUpdate");
The problem I'm having is that the list aggregation happens immediately and instead of getting a list of statuses I'm getting the same list of Employee objects. I want the Employee objects to be sent on the SEDA queue and the return value from the processing on the queue to be aggregated then run through the JobCompletionProcessor to update the Job record.
The behaviour is you are seeing is the default behavior. The apache camel splitter documentation clearly states this in the what the splitter returns section.
Camel 2.2 or older: The Splitter will by default return the last
splitted message.
Camel 2.3 and newer: The Splitter will by default return the
original input message.
For all versions: You can override this by supplying your own
strategy as an AggregationStrategy. There is a sample on this page
(Split aggregate request/reply sample). Notice it's the same
strategy as the Aggregator supports. This Splitter can be viewed as
having a build in light weight Aggregator.
So as you can see you are required to implement your own splitter aggregation strategy. To do this create a new class that implements AggrgationStrategy something like the code below:
public class MyAggregationStrategy implements AggregationStrategy
{
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if (oldExchange == null) //this would be null on the first exchange.
{
//do some work on the first time if needed
}
/*
Here you put your code to calculate failed, updated, created.
*/
}
}
You can then use your custom aggregation strategy by specifying it like the following examples:
.split(body(), new MyAggregationStrategy()) //Java DSL
<split strategyRef="myAggregationStrategy"/> //XML Blueprint
Apache Camel 2.12.1
Is it possible to use the Camel CSV component with a pollEnrich? Every example I see is like:
from("file:somefile.csv").marshal...
Whereas I'm using the pollEnrich, like:
pollEnrich("file:somefile.csv", new CSVAggregator())
So within CSVAggregator I have no csv...I just have a file, which I have to do csv processing myself. So is there a way of hooking up the marshalling to the enrich bit somehow...?
EDIT
To make this more general... eg:
from("direct:start")
.to("http:www.blah")
.enrich("file:someFile.csv", new CSVAggregationStrategy) <--how can I call marshal() on this?
...
public class CSVAggregator implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
/* Here I have:
oldExchange = results of http blah endpoint
newExchange = the someFile.csv GenericFile object */
}
Is there any way I can avoid this and use marshal().csv sort of call on the route itself?
Thanks,
Mr Tea
You can use any endpoint in enrich. That includes direct endpoints pointing to other routes. Your example...
Replace this:
from("direct:start")
.to("http:www.blah")
.enrich("file:someFile.csv", new CSVAggregationStrategy)
With this:
from("direct:start")
.to("http:www.blah")
.enrich("direct:readSomeFile", new CSVAggregationStrategy);
from("direct:readSomeFile")
.to("file:someFile.csv")
.unmarshal(myDataFormat);
I ran into the same issue and managed to solve it with the following code (note, I'm using the scala dsl). My use case was slightly different, I wanted to load a CSV file and enrich it with data from an additional static CSV file.
from("direct:start") pollEnrich("file:c:/data/inbox?fileName=vipleaderboard.inclusions.csv&noop=true") unmarshal(csv)
from("file:c:/data/inbox?fileName=vipleaderboard.${date:now:yyyyMMdd}.csv") unmarshal(csv) enrich("direct:start", (current:Exchange, myStatic:Exchange) => {
// both exchange in bodies will contain lists instead of the file handles
})
Here the second route is the one which looks for a file in a specific directory. It unmarshals the CSV data from any matching file it finds and enriches it with the direct route defined in the preceding line. That route is pollEnriching with my static file and as I don't define an aggregation strategy it just replaces the contents of the body with the static file data. I can then unmarshal that from CSV and return the data.
The aggregation function in the second route then has access to both files' CSV data as List<List<String>> instead of just a file.
I would like to process both request and response messages at the end of my route. However, I do not see a way how to access the original request message.
I have the terrible feeling I am struggling with some basic concept.
Here is a simple example route in DSL to outline my problem (streamCaching is enabled for the whole context):
from("activemq:queue:myQueue")
.to("log:" + getClass().getName() + "?showOut=true")
.to("http://localhost:8080/someBackend")
.log("Now in.Body returns this: ${in.body} and out.Body this: ${out.body}")
.to("log:" + getClass().getName() + "?showOut=true");
Here is an according excerpt from my logs (line-breaks edited for better reading). As one can see, the original SOAP message is lost once the http server replied, and the SOAP response object is stored in the inBody of the message.
2012-09-25 17:28:08,312 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut, BodyType:byte[],
Body:<?xml version="1.0" encoding="UTF-8"?><env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header /><env:Body><urn:someRequest xmlns:urn="http://foo.bar.local/ns"></urn:someRequest></env:Body></env:Envelope>,
Out: null]
2012-09-25 17:28:08,398 org.apache.camel.component.http.HttpProducer DEBUG -
Executing http POST method: http://localhost:8080/someBackend
2012-09-25 17:28:09,389 org.apache.camel.component.http.HttpProducer DEBUG -
Http responseCode: 200
2012-09-25 17:28:09,392 route2 INFO -
Now in.Body returns this: <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:someResponse xmlns:ns2="http://foo.bar.local/ns"</ns2:someResponse></soap:Body></soap:Envelope>
and out.Body this:
2012-09-25 17:28:09,392 local.bar.foo.MyRouteBuilder INFO -
Exchange[ExchangePattern:InOut,
BodyType:org.apache.camel.converter.stream.InputStreamCache,
Body:[Body is instance of org.apache.camel.StreamCache],
Out: null]
I would have expected to have in.body and out.body be preserved across the whole route?
Alternative solutions I am considering:
Make use of the Correlation Identifier pattern to correlate both request and reply. But would this preserve the message bodies as well? Also, my request/reply messages do not have unique identifiers for correlation.
Write a custom bean, which performs the call to the http backend, processing both request and reply objects (but this is basically a no-Camel solution, reinventing the wheel and hence not preferred)
Already failed approaches:
I tried to access the original request message using a Processor like this at the end of my route, with no success:
process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
Message originalInMessage = exchange.getUnitOfWork().getOriginalInMessage();
logger.debug(originalInMessage.getBody(String.class));
logger.debug(exchange.getIn().getBody(String.class));
}
});
Thanks for any help
Simply store the original body of the in message in a header or a property and retrieve it at the end:
from("activemq:queue:myQueue")
.setProperty("origInBody", body())
.to("http://localhost:8080/someBackend")
After the http call you can then access the property origInBody.
First, this article shows very well how in and out works in camel: http://camel.apache.org/using-getin-or-getout-methods-on-exchange.html
Typically, the out message is not always used, but rather copied from the in-message in each step.
In your case, where you want the original message to stay around til the end of the route, you could go ahead with the Enrichment EIP. http://camel.apache.org/content-enricher.html
Your route would be something like this:
public class MyAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange orig, Exchange httpExchange){
// if you want to check something with the Http request, you better do that here
if( httpExchange is not correct in some way )
throw new RuntimeException("Something went wrong");
return orig;
}
}
AggregationStrategy aggStrategy = new MyAggregationStrategy();
from("activemq:queue:myQueue")
.enrich("http://localhost:8080/someBackend",aggStrategy)
.//keep processing the original request here if you like, in the "in" message
One of the biggest problem of camel, is the ease to misuse it. The best way to use it correctly is to think in terms of EIP : one of the main goals of camel, is to implement EIP in its DSL.
Here is a list of EIP
Now think about it. You want the request and the response at the end, for what use ? Logging, Aggregation, ... ? For logging, a correlationId should suffice, so I presume you need it to create a response, based on both request and the proxied-response. If that's what you want, you could do something like
from("direct:receiveRequest")
.enrich("direct:proxyResponse", new RequestAndResponseAggregationStrategy())
You will have the opportunity to merge your Request (in oldExchange) and your Response (in newExchange).
With all the due respect I have for Christian Schneider, I do think the idea of putting the request in a property that could be reused later is a bad design. By doing it, you create side-effect between your routes. If your route is a subroute for another, you'll maybe erase their property. If you store it to put it back later, maybe you should do something like
from("direct:receiveRequest")
.enrich("direct:subRouteToIgnoreResponse", AggregationStrategies.useOriginal())
A really really bad design that I have done too many time myself is to do :
from("direct:receiveRequest")
.to("direct:subroute")
from("direct:subroute")
.setProperty("originalBody", body())
.to("direct:handling")
.transform(property("originalBody")
This will lead to "properties/headers hell", and to routes that are just a successive call of processors.
And if you can't think of a solution of your problem with EIP, you should maybe use camel only to access their components. For example, something like :
from("http://api.com/services")
.to(new SomeNotTranslatableToEIPProcessor())
.to("ftp://destination")
But don't forget that those components has their own goals : creating a common abstraction between similar behaviour (e.g, time based polling consumer). If you have a very specific need, trying to bend a camel component to this specific need can lead to huge chunk of code not easily maintainable.
Don't let Camel become your Golden Hammer anti-pattern
I often use an aggregation strategy, which preserves the old exchange and puts the result of the enrich into a header:
import org.apache.camel.Exchange;
import org.apache.camel.processor.aggregate.AggregationStrategy;
public class SetBodyToHeaderAggregationStrategy implements AggregationStrategy {
private String headerName = "newBody";
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
oldExchange.getIn().setHeader(headerName, newExchange.getIn().getBody());
return oldExchange;
}
#SuppressWarnings("unused")
public void setHeaderName(String headerName) {
this.headerName = headerName;
}
}
Now you can use it like this:
<enrich strategyRef="setBodyToHeaderAggregationStrategy">
<constant>dosomething</constant>
</enrich>
<log loggingLevel="INFO" message="Result body from enrich: ${header.newBody}. Original body: ${body}" loggerRef="log"/>