Camel sftp doesn't poll on Unix more than 2 levels deep - apache-camel

Camel sftp is unable to poll more than 2 levels deep when the java code runs on Linux, but it works fine on Windows.
For example, polling files from
sftp://user#domain:22/folder1/folder2?...
works on both Unix and Windows. But, when I use something like
sftp://user#domain:22/folder1/folder2/folder3?...,
the route is always started yet the route running on Unix doesn't get the files in folder 3.
Route: route22 started and consuming from:sftp://user#domain:22/folder1/folder2/folder3?...
The sftp is to the same Unix machine and the same paths are used.
I have tried with stepwise true and false, as well as with recursive.
Could anyone shed some light on this please?

The problem was caused by a quartz trigger (attached to the route) that became corrupted. That happend because of a camel bug that makes camel unable to reconcile triggers when running in cluster mode if they fail due to database reasons.

Related

Flink 1.5-SNAPSHOT web interface doesn't work

I recently came across a bug in Flink, reported (https://issues.apache.org/jira/browse/FLINK-8685) and found out that it has been reported and a pull request has been created (https://github.com/apache/flink/pull/5174).
Now I clone 1.5-SNAPSHOT, apply the patch and build Flink. Even though it builds (no matter patch is applied or not), when I run Flink (using start-cluster.sh), web dashboard doesn't work and command
tail log/flink-*-jobmanager-*.log returns "tail: cannot open 'log/flink-*-jobmanager-*.log' for reading: No such file or directory"
. I tested with a batch programs and surprisingly it returned results on terminal, but streaming programs and other things still don't work.
Any suggestions on this issue?
Thank you.
In case flink dashboard does not start change port in conf file and restart. Default port of flink could be occupied by other process in windows.
Also change log level for flink to debug.

Camel File Poller of a Mounted CIFS

We're experiencing a strange problem.
We have a file component monitoring a folder. This works perfectly if the path is either
a) myrelativepath - which is relative to the Karaf installation where the camel route is run; or
b) /tst/mypath - which reads from a folder from the root
If I set log level to DEBUG I see the logs of it polling based on my interval.
However, if I set the path to be:
/mnt/windowsshare - which is a mounted windows share.
I get nothing in the logs, I don't see the poll, and it doesn't pick up any files. apparently the route is started though.
Interestingly, I have another camel route which writes a file to that location (a subfolder called inbound) and it writes file with no problem.
Any ideas?
I can get perhaps more logs tomorrow, but this is only happening in this environment where we have a windows share. And the share seems to be fine.
For testing we have run Camel as root and as root on the commandline we have tested reading the files (via vi) and all is ok.
Any suggestions for things to look at?
Basically make sure you don't have too many files in the antexlcude...polling is logarithmic and a fraction more makes polling very slow.
Needs more code analysis and VM introspection to understand why.

Apache Camel GenericFileOperationFailedException: 'Cannot rename file' locks exchange

We have an integration system based on Camel v2.16.1 that runs on a Jboss v6 Linux platform. There are multiple interfaces running simultaneously each with a different polling rate.
We are intermittently experiencing 'Cannot rename file' issue with Camel failing to backup to the 'done' folder successfully processed and transmitted files from the FTP source. Restarting the camel application fixes the issue.
Basically, at regular intervals triggered by a quartz scheduler, the route:
picks up files from a source via FTP,
processes them, smooks + xsl transformations
delivers the generated flat file to an endpoint via FTP.
If multiple files are read from the source directory, then all the files are appended together in a temporary file before being processed.
The Camel FTP configuration uses the following URL:
ftp://xxxx/export?antInclude=dsciord_*.dat&inProgressRepository=#warehouseIntegrationIdempotentRepository&preMove=in_progress_bpo/$simple{date:now:yyyyMMddHHmm}/$simple{file:name}&move=done&consumer.bridgeErrorHandler=true
read files dsciord_*.dat from /export directory
use custom inprogressRepository to store the read filename into a local db (this was done to prevent contention issue with a second cluster node, however, currently only a single node is live. This option is unnecessary and can be removed speeding up the process).
move files to an in_progress_bpo/201609061522 directory, where the subdirectory is created based on the date_timestamp.
move them to the in_progress_bpo/201609061522/done subdirectory once successfully processed.
In vast majority of cases the route works with no issues, however, sometimes the file(s) cannot be moved to the done folder (see error below). Even in this case, the route can sometimes continue successfully at the next polling cycle, however, in other cases the route enters a state when even if the quartz scheduler triggers the poll, the route fails to detect any files in the source /export directory even when there ARE files there.
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot rename file: RemoteFile[in_progress_bpo/201609060502/dsciord_3605752.dat] to: RemoteFile[in_progress_bpo/201609060502/done/dsciord_3605752.dat]
Notes: We are using
a single instance of a ConsumerTemplate to handle our interfaces.
a custom inprogressRepository to store the file names read.
Obviously, there must be a system locking the source files and this is causing the Camel route to stop processing further files.
Any ideas/suggestions on debugging/resolving this issue would be greatly appreciated. The issues that I read through the camel-users forum seem to deal with Windows-related deployments, sometimes Smooks failing to close the input stream. I've check and we don't use the
org.milyn.templating.xslt.XslTemplateProcessor#bypass method where Smooks fails to close the underlying input stream.
Finally I have been able to reproduce/identify the issue.
Given that we are using a relative path to move the processed files into once successfully ftp-ed to the destination servers:
../../../u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030/done
However, for some reason instead of traversing the via correct path to move the processed files the camel consumer creates a new subdirectory tree starting from the current working directory and this could be quite long as follows. Hence the problem. It doesn’t know where it is and it doesn’t reset itself.
/u/4gl_upload/warehouse_integration_2/trs-server/u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030
This was reproduced with the option stepwise=false, which means it traverses the subdirectories in a single step instead of step wise.
Still don’t know what best solution is.

Camel FTP Issue Related To Large File Transfer

I am working on a requirement related to downloading large size files through camel-ftp component.
Route definition is as below :
from("sftp://host:22?connectTimeout=30000&username=xxx&password=yyyy&localWorkDirectory=D:/templocation")
.to("file:///D:/mylocation");
I am looking for an answer to the below questions.
Does Camel SFPT supports resume functionality in case there is a server disconnect.I have observed that .inprogress file
gets deleted once SocketTimeout/IOException exception is thrown from underlying JSCH library. My expectation is that camel should re establish
the connection once it is available and resume downloading from the point where it left.
Parameters such as connectTimeout, timeout and soTimeout have no effect. In windows platform(WIN 7), if the server stays disconnected for
approximately 21 seconds, Camel deletes the .inprogress file. Is there any other parameter in camel FTP component that has to be set
to control consumer timeout. Issue would be if the file size is very large(1 GB or more) and server gets disconnected when more that
90% is downloaded.
Any help in this regard will be highly appreciated.
#ClausIbsen :
Thank you so much for your answer. I would really appreciate your feedback on point 2.
I went through Camel FTP component source code and found SftpOperations.retrieveFileToFileInLocalWorkDirectory
is the method where the functionality related to retrieving data from JSch library is implemented.
Code is such that any exception received from underlying library will cause the .inprogress file to get deleted
i,e channel.get(remoteName, os);. I investigated JSch library where they have a option of resume :
get(String src, OutputStream dst, SftpProgressMonitor monitor, int mode, long skip)
Downloads a file to an OutputStream.
I incorporated this API in retrieveFileToFileInLocalWorkDirectory method by tracking if there is any .inprogress file
and if exists, it's filesize.
if(fileSize>0)
{
channel.get(remoteName, os, progressMonitor, ChannelSftp.RESUME, fileSize );
}
else
{
channel.get(remoteName, os, progressMonitor);
}
ProgressMonitor implementation helps me to track if the download is complete or not.
fileSize=temp.length();
boolean isFileDownloadComplete=(fileSize==progressMonitor.getMax());
if(isFileDownloadComplete)
rename and move the file.
With the above implementation and commenting out the original file deletion behaviour, download resume functionality is working.
I am able to resume file download even though server disconnect .
I have one question here :
Do you foresee any implementation flaw here in the above solution.
Is there any functionality that is going to be impacted which I missed.
I would really appreciate your feedback.

Use libfuse in a project, without root access (for installation)? FTP mounts & inotify/kqueue/FSEvents

I'd like my application to be able to show a directory listing from a remote FTP (or SFTP etc) location. When a file/directory changes in the remote directory tree, the application should update its view with the relevant changes.
Because traversing the entire tree is slow and wasteful, I'd like to use something along the lines of FSEvents (inotify/kqueues on Linux), but obviously these libraries are filesystem-based, and a connection to an FTP server is not the same as a mounted filesystem.
In order to make these libraries work, I'd need to actually mount a filesystem backed by FTP/SFTP on the local machine, then attach an FSEventStream (or kqueue etc) to this local mount. I know FUSE can do this, but is there any way I can use FUSE without the user having to first install it? I mean, can I bundle it with my (Mac) application and create mounts without having to put the user through the process of actually running an installer package to copy libfuse and the kernel modules into the system? Does it assume /dev/fuse exists, or can this live outside the /dev/ path, inside my application directory?
Nice Mac applications are installed with a simple drag & drop and I'd like to keep mine this way if possible. I'm unclear on if it's possible to use libfuse directly (provided the files are included with the app), without installing it in the system paths.
Alternatively, does anyone have any other suggestions for monitoring for changes over FTP, without polling?
Unfortunately FTP and SFTP do not support any form of client notification.
Much like HTTP they are based on a request/response scheme, where each data transfer is initiated by the client. What makes things worse is that, contrary to HTTP, there is no way to ask the server to inform the client of any changes since a specific date.
This means that not only you have to use polling, but also that said polling will by no means be lightweight.
As far as FUSE is related, most FTP and SFTP modules that are available only update their view of the filesystem when the userspace applications ask for a directory listing (e.g. hitting Refresh in a file browser window). They do not perform polling on their own. Your userspace application will have to initiate the refresh by polling the directory itself.
EDIT:
To clarify a couple of things, recent versions of FUSE do support notification events. They
simply pass through the events from the modules to the kernel. The modules still have to generate them and in the case of an FTP/SFTP client module that is impossible without polling the server.
Also keep in mind that many current NFS implementations do not support change notifications either, despite the fact that NFSv4.1 has the necessary provisions. Many SMB/CIFS servers (esp. those in cheap Network-Attached-Storage embedded systems) also have limited to no support.

Resources