ftp consumer stream download seda queue fails during splitter tariterator - apache-camel

First a little background on my requirements:
Download large tar.gz files from multiple dynamically defined read
only ftp/sftp sites.
Process the files within the .tar.gz based on
extension of the entry name.
Using Camel 2.19.3
My solution is to define new routes with download=false to only obtain a list of unprocessed files. A sample route is:
from("ftp://user#localhost/path?download=false&inProgressRepository=#inProgressRepo&idempotentRepository=#idemRepo&noop=true&readLock=changed&readLockMarkerFile=false&autoCreate=false&stepwise=false").to("seda:download?size=3&concurrentConsumers=3&blockWhenFull=true&purgeWhenStopping=true")
Send the file names to a seda queue that downloads the files with streamDownload and sends the RemoteFile to a processing router defined as:
from("seda:download?size=3&concurrentConsumers=3&blockWhenFull=true&purgeWhenStopping=true")
.process({
String fileName = exchange.getIn().getHeader(Exchange.FILE_NAME_ONLY, String.class);
CamelContext context = exchange.getContext();
ConsumerTemplate downloadConsumer = context.createConsumerTemplate();
Producer unpackProducer = context.getRoute("unpack").getEndpoint().createProducer();
Map<String,Object> parms = new HashMap<>();
parms.put("fileName", fileName);
parms.put("runLoggingLevel", "INFO");
parms.put("consumer.bridgeErrorHandler", "true");
parms.put("idempotentRepository", "#idemRepo");
parms.put("noop", "true");
parms.put("readLock", "changed");
parms.put("readLockLoggingLevel", "INFO");
parms.put("readLockMarkerFile", "false");
parms.put("initialDelay", "0");
parms.put("autoCreate", "false");
parms.put("maximumReconnectAttempts", "0");
parms.put("streamDownload", "true");
parms.put("stepwise", "false");
parms.put("throwExceptionOnConnectFailed", "true");
parms.put("useList", "false");
downloadConsumer.start();
Exchange downloadExchange = downloadConsumer.receive(URISupport.normalizeUri(URISupport.appendParametersToURI("ftp://user#localhost/path", parms));
unpackProducer.process(downloadExchange);
if (downloadExchange.isFailed()) {
LOGGER.error("unpack failed", downloadExchange.getException());
exchange.setException(downloadExchange.getException());
}
downloadConsumer.doneUoW(downloadExchange);
downloadConsumer.stop();
}
The unpack route is defined as:
from("direct:unpack").routeId("unpack")
.convertBodyTo(InputStream.class, null)
.split(new TarSplitter()).streaming()
.choice()
.when(header(FILE_NAME).regex(XML_FILTER))
.unmarshal().jacksonxml(POJO.class)
.endChoice()
.when(header(FILE_NAME).regex(XML2_FILTER))
.unmarshal().jacksonxml(POJO2.class)
.endChoice()
.end()
.end()
.to("file://...")
First, is this a good solution to support concurrent ftp consumers? I see that new FTPClient instances are created and processed on the same thread. Is there a better solution?
Second, with the seda queue there are random tar errors processing the stream. If direct is used instead of seda so that only a single file is processed then no errors occur. This seems to point to a concurrency issue. Am I missing something obvious?
Thanks in advance for any help.

My mistake, needed to add binary=true.

Related

Spring Integration + Inbound channel adapter + Recursive directory scanner

Inbound channel adapter is created with a poller to poll files present in root directory and its sub directories
e.g.
RootDir
|_abc.txt
|_subdirectory1
|_subdirfile1.doc
The problem is inbound channel adapter is reading the directory also as message
#Bean
#InboundChannelAdapter(autoStartup = "false", value = "incomingchannel", poller = #Poller("custompoller"))
public MessageSource<File> fileReadingMessageSource(DirectoryScanner directoryScanner) {
FileReadingMessageSource sourceReader = new FileReadingMessageSource();
sourceReader.setScanner(directoryScanner);
}
#Bean
public DirectoryScanner directoryScanner() {
DirectoryScanner scanner = new RecursiveDirectoryScanner();
CompositeFileListFilter filter = new CompositeFileListFilter<>(
Arrays.asList(new AcceptOnceFileListFilter<>(), new RegexPatternFileListFilter(regex)));
scanner.setFilter(filter);
return scanner;
}
#Trasnformer(inputChannel="incomingchannel",....
torequest(Mesage<File> message) {
message.getPayload()
}
Here message.getpayLoad is printing subdirectory1 i.e. directory is also read as a file message
I can handle explicitly as file is directory or not in trasnformer and ignore, but wanted to know is there any way it can be filtered in Recursive Directory scanner attached to Inbound Channel adapter ?
This problem is probably related to this SO thread: Spring Integration + file reading message source _ Inbound Channel Adapter + Outbound Gateway.
You need to think twice if you are OK loosing file tree. It sounded for me that you would like to restore a tree in the FileWritingMessageHandler. So, it is probably better to #Filter messages with directory payload before sending to that transformer.
If you still want to skip dirs from the producing, consider to use a ChainFileListFilter instead of CompositeFileListFilter and configure a RegexPatternFileListFilter first.
This way a filtered directory from the RegexPatternFileListFilter (it is skipped by default see AbstractDirectoryAwareFileListFilter) won't go to the AcceptOnceFileListFilter at all. In your current configuration the AcceptOnceFileListFilter being first accepts a directory and really ignores the next filter in the composition.
UPDATE
What I mean should be like this:
#Bean
public DirectoryScanner directoryScanner() {
DirectoryScanner scanner = new RecursiveDirectoryScanner();
ChainFileListFilter filter = new ChainFileListFilter<>(
Arrays.asList(new RegexPatternFileListFilter(regex), new AcceptOnceFileListFilter<>()));
scanner.setFilter(filter);
return scanner;
}
Nothing more. As long as your regex is just for files, any sub-directory would be skipped and not allowed to go downstream.

Create Event-Driven Consumer on File Endpoint without RouteBuilder in Camel 2.24

I want to run a processor upon file appearance in a directory. My file url is like this:
file:{{file.root}}in?include=.*\.csv&charset=windows-1251&move=../out/done
A procedure that associates an url with a processor is like this:
MessageProcessor getOrCreateConsumer(CamelContext context, String uri) {
Endpoint endpoint = context.getEndpoint(uri);
endpoint.setCamelContext(context); // added this out of desperation, doesn't help
processor = new MessageProcessor();
try {
Consumer consumer = endpoint.createConsumer(processor);
endpoint.start(); // do we need this at all? works the same without it
consumer.start();
} catch (Exception e) {
throw new RuntimeException(e);
}
return processor;
}
}
MessageProcessor is a processor that does some things to an exchange.
Everything seems to work except the file doesn't get moved to the ../out/done directory. While debugging I can't get when the endpoint is configured to provide the file message exchange with this post operation.
I think I am missing some magic call that is normally invoked by a RouteBuilder and that will fully configure the file endpoint. Can you please help me out?

Let Camel handle various URI types

I would like to write a Camel Route that gets in a URI (can be http, ftp, file, ...) and then fetches the data and stores it locally in a file.
This URI-String could be, for example:
"ftp://localhost/example.txt"
"file://tmp/example.txt"
"jms:queue:dataInputQueue"
...
Based on this string, the correct Camel Component should be used to access the data. Something like a case/switch in Java:
(1) Receive URI (from uri="vm:incomingUri")
(2) Chose "right" Camel Component
switch(URI)
case HTTP: use Camel HTTP component
case FTP: use Camel FTP component
case JMS: use Camel JMS component
...
(3) Read data from that URI, using the "right" Camel component
(4) Store file locally (to uri="file://...)
Example:
From "vm:incomingUri" I read a String "ftp://localhost/example.txt". That what finally needs to happen now should be equivalent to this:
<route>
<from uri="ftp://localhost/example.txt"/>
<to uri="file://tmpDir/example.txt"/>
</route>
How would this look like in Camel?
I believe one difficulty will be that, for the components you mention (HTTP, FTP, file, JMS), you may want to use either a producer or a consumer:
FTP, File: definitely a consumer to read a file.
HTTP (or HTTP4): definitely a producer, to send a request to the server (the server's reply will by the new message body)
JMS: depends on wether you want to read from a queue (consumer), or send a message to a queue with a ReplyTo header, then wait for the answer (producer).
Producers :
If you are using Camel 2.16+, you can use the new "dynamic to" syntax. It's basically the same as a regular "to", except that the endpoint uri can be evaluated dynamically using a simple expression (or, optionnaly, another type of expression). Alternatively, you can use the enrich flavor of the content-enricher pattern, wich also supports dynamic uris starting with Camel 2.16.
If you are using an older version of Camel, or if you need to dynamically route to several endpoints (not just one), you can use the recipient list pattern.
Here's an exemple. We will transform the message body by calling an endpoint; the uri for that endpoint will be found in a header named TargetUri and will be evaluated dynamically for each message.
// An instance of this class is registered as 'testbean' in the registry. Instead of
// sending to this bean, I could send to a FTP or HTTP endpoint, or whatever.
public class TestBean {
public String toUpperCase(final String str) {
return str.toUpperCase();
}
}
// This route sends a message to our example route for testing purpose. Of course, we
// could send any message as long as the 'TargetUri' header contains a valid endpoint uri
from("file:inbox?move=done&moveFailed=failed")
.setHeader("TargetUri").constant("bean:testbean?method=toUpperCase")
.setBody().constant("foo")
.to("direct:test");
// 1. The toD example :
from("direct:test")
.toD("${header.TargetUri}")
.to("log:myRoute");
// 2. The recipient list example :
from("direct:test")
.recipientList(header("TargetUri"))
.to("log:myRoute");
// 3. The enrich example :
from("direct:test")
.enrich().simple("${header.TargetUri}") // add an AggregationStrategy if necessary
.to("log:myRoute");
Consumers :
With Camel 2.16+, you can use the pollEnrich flavor of the content-enricher pattern.
For older versions of Camel, you can use a ConsumerTemplate in a processor.
// 4. The pollEnrich example (assuming the TargetUri header contains, e.g., a file
// or ftp uri) :
from("direct:test")
.pollEnrich().simple("${header.TargetUri}") // add an AggregationStrategy if necessary
.to("log:myRoute");
// 5. The ConsumerTemplate example (same assumption as above)
from("direct:test")
.process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
String uri = exchange.getIn().getHeader("TargetUri", String.class);
ConsumerTemplate consumer = exchange.getContext().createConsumerTemplate();
final Object data = consumer.receiveBody(uri);
exchange.getIn().setBody(data);
}
})
.to("log:myRoute");
Producer or consumer?
Sadly, I can't think of any really elegant solution to handle both - I think you will have to route to two branches based on the uri and known components... Here's the sort of thing I might do (with Camel 2.16+), it's not very pretty:
// This example only handles http and ftp endpoints properly
from("direct:test")
.choice()
.when(header("TargetUri").startsWith("http"))
.enrich().simple("${header.TargetUri}")
.endChoice()
.when(header("TargetUri").startsWith("ftp"))
.pollEnrich().simple("${header.TargetUri}")
.endChoice()
.end()
.to("log:myRoute");
It is possible by using
<to uri="{{some.endpoint}}"/>
But you would require to add it in property .
<cm:property name="some.endpoint" value="SomeEndPoint"/>
And you can add any endpoint you want http, ftp, file, log, jms, vm etc.
Value of SomeEndPoint.
Log Component: log:mock
JMS Component: activemq:someQueueName
File Component: file://someFileShare
VMComponent: vm:toSomeRoute

Camel - data from two sources

We are using apache Camel for routing and extracting from the files.
I have a situation where I need to get the data from a file on a shared folder and data from database. I need to combine the data only when data from both sides have arrived. If either side has not received then my data combine process should wait till both sides are present.
Is it possible to possible? How can I achieve that? Any sample code?
Something must trigger the process - either the file or the database so pick one.
Then you can use the enricher pattern to populate the other source (when data is ready). An aggregation strategy is used to combine the data. You typically write the aggregation strategy in java.
The link has examples how to enrich and merge data. You can find out how to handle database and files in the Camel documentation.
I use this for zip processed file with processing log together. I attached an example, hope it will help you.
//Archived
from("direct:" + EnvironmentSetup.ARCHIVED)
.routeId(ROUTES.ARCHIVED.name())
.setHeader(HEADER_ZIP_AGG_ID, header(Exchange.FILE_NAME))
.setHeader(HEADER_AFTER_ZIP_DEST).constant(getArchiveUri())
.setHeader(HEADER_STATUS).constant(STATUS.SUCCESS)
.pipeline()
.to("direct:" + EnvironmentSetup.ARCHIVED_ZIP)
.end()
.pipeline()
.setHeader(Exchange.FILE_NAME, header(Exchange.FILE_NAME).append(".report"))
.setBody(header(ProcessManager.PROCESS_LOG).convertToString())
.to("direct:" + EnvironmentSetup.ARCHIVED_ZIP)
.end()
.end();
from(
"direct:" + EnvironmentSetup.DECRYPT_FAILED_ZIP,
"direct:"+EnvironmentSetup.PROCESS_FAILED_ZIP,
"direct:"+EnvironmentSetup.ARCHIVED_ZIP
)
.routeId("ZIP")
.aggregate(header(HEADER_ZIP_AGG_ID), new CopiedGroupedExchangeAggregationStrategy())
.completionSize(2)
.marshal(zipFileDataFormat)
.multicast()
.pipeline()
.setHeader(Exchange.FILE_NAME, simple(String.format(
"${in.header.%s}/${in.header.%s}", HEADER_EMAIL, Exchange.FILE_NAME))) //header(HEADER_EMAIL). header(Exchange.FILE_NAME))
//.dynamicRouter(header(HEADER_AFTER_ZIP_DEST))
.to("direct:dynamic")
.end()
.pipeline()
.marshal(encryption)
.setHeader(Exchange.FILE_NAME, simple(String.format(
"${in.header.%s}/${in.header.%s}.gpg", HEADER_EMAIL, Exchange.FILE_NAME)))
//.setHeader(Exchange.FILE_NAME, header(Exchange.FILE_NAME).append(".gpg"))
.to("direct:"+EnvironmentSetup.SEND_BACK)
.end()
.end() //end aggregate
.end();
CopiedGroupedExchangeAggregationStrategy.java
public class CopiedGroupedExchangeAggregationStrategy extends
AbstractListAggregationStrategy<Exchange> {
#Override
public boolean isStoreAsBodyOnCompletion() {
// keep the list as a property to be compatible with old behavior
return true;
}
#Override
public Exchange getValue(Exchange exchange) {
return exchange.copy();
}
}

Camel: synchronization between parallel routes in same camel context

I'm working on a camel prototype which uses two start points in the same camel context.
The first route consumes messages which are used to "configure" the application. Messages are loaded in a configuration repository through a configService bean:
// read configuration files
from("file:data/config?noop=true&include=.*.xml")
.startupOrder(1)
.to("bean:configService?method=loadConfiguration")
.log("Configuration loaded");
The second route implements a recipient list eip pattern, delivering a different kind of input messages to a number of recipients, which are read dinamically from the same configuration repository:
// process some source files (using configuration)
from("file:data/source?noop=true")
.startupOrder(2)
.unmarshal()
.to("setupProcessor") // set "recipients" header
.recipientList(header("recipients"))
// ...
The question that arises now is how to synchronize them, so the second route "waits" if the first is processing new data.
I'm new to Apache Camel and pretty lost on how to approach such a problem, any suggestion would be appreciated.
Use aggregate in combination with the possibility to start and stop routes dynamically:
from("file:data/config?noop=true&include=.*.xml")
.id("route-config")
.aggregate(constant(true), new MyAggregationStrategy()).completionSize(2).completionTimeout(2000)
.process(new Processor() {
#Override
public void process(final Exchange exchange) throws Exception {
exchange.getContext().startRoute("route-source");
}
});
from("file:data/source?noop=true&idempotent=false")
.id("route-source") // the id is needed so that the route is found by the start and stop processors
.autoStartup(false) // this route is only started at runtime
.aggregate(constant(true), new MyAggregationStrategy()).completionSize(2).completionTimeout(2000)
.setHeader("recipients", constant("direct:end")) // this would be done in a separate processor
.recipientList(header("recipients"))
.to("seda:shutdown"); // shutdown asynchronously or the route would be waiting for pending exchanges
from("seda:shutdown")
.process(new Processor() {
#Override
public void process(final Exchange exchange) throws Exception {
exchange.getContext().stopRoute("route-source");
}
});
from("direct:end")
.log("End");
That way, route-source is only started when route-config is completed. route-config and consequently route-source are restarted if new files are found in the config directory.
You can also place an "on completion" http://camel.apache.org/oncompletion.html in the first route that activates the second one.
Apache camel File will create a lock for the file that being processed. Any other File process on this file will not pool on if there is a lock (except if you put consumer.exclusiveReadLock=false)
source :
http://camel.apache.org/file.html => URI Options => consumer.exclusiveReadLock

Resources