Apache camel: SFTP: files downloaded multiple times - apache-camel

I've defined the following camel route:
RouteBuilder rb = new RouteBuilder() {
#Override
public void configure() throws Exception {
from("sftp://myhost//path/to/files/")
.to("log:loggingCategory?level=INFO")
.to("file:///tmp/");
}
};
When I start the context with this route camel does connect and it downloads the files. My problem is that camel repeats downloading the same files until the context is shut down. Why does the FTP2 component do this and how can I stop it?
I've included version 2.10.4 of camel-core and camel-ftp via maven.

The Indempotent Consumer does the trick. Docs of the FTP2 component refer to the File2 component as "as all the options there also applies for this component". There is a parameter "indempotent=true" that activates usage of an LRUCache:
Option to use the Idempotent Consumer EIP pattern to let Camel skip
already processed files. Will by default use a memory based LRUCache
that holds 1000 entries. If noop=true then idempotent will be enabled
as well to avoid consuming the same files over and over again.
My complete source definition now looks like this:
from("sftp://myhost//path/to/files/?username=user&password=secret&idempotent=true")

From the camel ftp2 documentation:
The FTP consumer will by default leave the consumed files untouched on
the remote FTP server. You have to configure it explicitly if you want
it to delete the files or move them to another location. For example
you can use delete=true to delete the files, or use move=.done to move
the files into a hidden done sub directory.
To delete the file, change the route to
from("sftp://myhost//path/to/files?delete=true")
Ensure that the connected user has required permissions.

Related

restarting a route initialized with File component does not poll the existing files again

Thanks to JMX (java console), I try to restart a route with a file component consumer endpoint.
from("file:<some dir>?noop=true")
I am using the wiretap pattern to record the intermediate data transformation through other files endpoint.
On first start of the camel application, everything is fine, and all the files already present in the input directory are polled and processed.
But when I try to restart the route thanks to jmx, nothing happens.
I try to manually removed .camel directory - created by I guess the default FileIdempotentRepository - before restarting the route, in vain.
I also tried to change the kind of IdempotentRepository with a MemoryIdempotentRepository :
from("file:<somedir>?noop=true").idempotentConsumer(header("CamelFileName"), MemoryIdempotentRepository.memoryIdempotentRepository(1000))
Even if I trigger the clear() operation of this MemoryIdempotentRepository before restarting the route in java console, nothing is polled from the input directory after restarting.
If I add a file, it works. Everything behaves like if there is a persistent history of the files already polled once.
I wonder if the use of the option "noop=true" creates an unmanaged idempotent repository I cannot control with jmx.
If true, the file is not moved or deleted in any way. This option is
good for readonly data, or for ETL type requirements. If noop=true,
Camel will set idempotent=true as well, to avoid consuming the same
files over and over again.
Any idea ?
(i am using camel-core 2.21)
I found the solution to my issue.
I made a bad use of idempotentConsumer; I needed to initialize the endpoint idempotent consumer inside the endpoint URI parameters list.
First, create an entry in a bean registry:
registry.put("myIdempotentRepository", MemoryIdempotentRepository.memoryIdempotentRepository(1000));
Then, refer to this idempotentRepository in the endpoint:
from("file:<somedire>noop=true&initialDelay=10&delay=1000&idempotentRepository=#myIdempotentRepository")
By doing this, GenericFileEndPoint:
will not create a default idempotentRepository
will add the idempotentRepository given in options of the endpoint to the services of the camel context. This means that it will be possible to manage it thanks to JMX
I think it would be useful to be allowed to manage the default idempotent repository in the FileEndPoint class of camel-core.

Apache Camel SFTP Consumer not deleting file after reading

I see a weird behavior with Apache Camel SFTP. Even after setting the delete=true attribute, it doesn't delete the file after receiving. I am using 3.0.0-M3 version of camel-ftp
Following is my SFTP configuration,
sftp://<<HOST_NAME>>:<<PORT>>/<<PATH>>?username=<<USERNAME>>" +
"&password=<<PASSWORD>>" +
"&preferredAuthentications=password" +
"&readLock=changed" +
"&readLockMinAge=30000" +
"&delay=20000" +
"&delete=true";
Now Camel is able to read the file, but it doesn't delete the file after reading. While going through the docs, it says
delete (consumer) -
If true, the file will be deleted after it is processed successfully.
How does camel define if it was processed successfully ? Do we need to set any exchange property for Camel to mark it processed successfully ?
After receiving the file all I am doing is pasing it to another route, like following,
from(endpointUri).to("direct:procesSftpFile");
Should I change it from direct to vm or seda?
Looks like nobody faced this issue and I somehow figured out the where this started happening.
The issue was not because of Camel sftp component, but it was with the piece of code I was calling.
Second part of my flow looks like this,
from("direct:procesSftpFile")
.log("...")
// logging and other regular processing
....
// sending to vm InOnly
.to("vm:queue1?exchangePattern=InOnly")
.. some more processing..
.to("vm:queue2?exchangePattern=InOnly")
So the issue was with calling those queue1 and queue2 in above snipet.
Commenting them, fixed it and sftp started deleting the files. For calling the VM, instead of to(), I used producerTemplate.asyncSend as workaround.
One thing I am still confused about is, if we are using InOnly exchange pattern, then why it is affecting the sftp behavior ? Probably I should ask this in a separate question.

Apache Camel File Component dynamic source directory in from EndPoint

I am currently working on a Camel based test harness application, that processes groups of files from multiple folders and compares with the files present in the local repository.
Is there any way to change the folder location from the end point in Camel route dynamically? I want to use a single route for polling files from the multiple folders.
According dynamic change endpoint camel, use following procedure:
stop the route
remove the route
change the endpoint
add the route
start the route

How to stop camel from deleting FTP file when processing fails and exception is handled by an error handler

I have a route that reads from an FTP server, then processes the message. The route has DeadLetterChannel error handler that routes the message to some bean when an exception is thrown while processing the message.
Now when an exception is handled by the error handler, Camel presumes everything passed fine and still deletes the FTP file.
If I remove the error handler, Camel doesn't delete the file when there is an exception.
Now my question is, how can i have a DeadLetterChannel error handler and at the same time stop Camel from deleting FTP file when processing fails?
You can set the option noop=true on the ftp endpoint. Then the file will be left alone.
Though you would then have to consider how you can skip picking up the files in the future? And for that you can use the idempotent repository to keep track of which files you have processed before. Or an alternative is to move the file when you are done etc.
As the ftp component extends the file component see details at: http://camel.apache.org/file2
You have several options to do that:
You do not use the delete=true option at all and handle the delete of the file in the "success" scenario by yourself. This would be relatively transparent.
In case you enter the DLC you can manipulate the endpoint from which you are consuming. Therefore just define your own processor for the DLC in onPrepareFailure. See example: deadLetterChannel("jms:dlc").onPrepareFailure(new ErrorProcessor())
After that you can use the getContext() method to get the camel context and one of the getEndpoint() methods to get your consumer endpoint.
When you have the endpoint you can see which 'process factory' class is used with the getProcessStrategyand there you can update the delete flag to avoid deleting your file.
For this endpoint it is also possible to define your own 'processStrategy' class with the method setProcessStrategy. Please take a look for yourself which process strategy class is used in your case. You then can override the according delete method like deleteLocalWorkFile and just do nothing.

download specific file with camel and shutdown

Can i download with camel a specific file list from a sftp server and then shutdown the service?
I know this should be a common question but i can't figure out how to do it without waiting the context stopping.
In some way, camel can ensure data integrity?
I guess you can do something like this using a direct route, pollEnrich and a template
from("direct:grabOneFile")
.pollEnrich("sftp://somewhere/blah/blah?fileName=foobar");
then from some java code somewhere, just grab a camel template and invoke the "direct:grabOneFile route.
String ret = template.requestBody("direct:grabOneFile","",String.class);
In this case, you don't have to worry about when to shut down the camel context with the chance of having multiple files etc.
Camel ftp component can only poll directories.
You can use a combination of maxMessagesPerPoll and fileName, like
from("ftp://.../xyz?maxMessagesPerPoll=x&fileName=y");
Also take a look at this.
This link has examples regarding shutdown.
Did you check this out at the bottom of the Camel FTP page.

Resources