Data loss on concurrent file write in camel - apache-camel

I am using camel technology for my file operation. My system is cluster environment.
Let say, I have 4 instances
Instance A
Instance B
Instance C
Instance D
Folders Structure
Input Folder: C:/app/input
Output Folder: C:/app/output
All the four instances will be pointing to Input folder location. As per, my business 8 files will be placed in the input folder and output will be consolidated file. here camel losing data when concurrently writing to output file.
Route:
from("file://C:/app/input")
.setHeader(Exchange.File_Name,simple("output.txt"))
.to("file://C:/app/output?fileExist=Append")
.end();
Kindly help me to resolve this issue. is there any thing like write lock in camel? to avoid concurrent file writer. Thanks in advance

You can use the doneFile option of the file component, see http://camel.apache.org/file2.html for more information.
Avoid reading files currently being written by another application
Beware the JDK File IO API is a bit limited in detecting whether another application is currently writing/copying a file. And the implementation can be different depending on OS platform as well. This could lead to that Camel thinks the file is not locked by another process and start consuming it. Therefore you have to do you own investigation what suites your environment. To help with this Camel provides different readLock options and doneFileName option that you can use. See also the section Consuming files from folders where others drop files directly.

Related

Apache Camel GenericFileOperationFailedException: 'Cannot rename file' locks exchange

We have an integration system based on Camel v2.16.1 that runs on a Jboss v6 Linux platform. There are multiple interfaces running simultaneously each with a different polling rate.
We are intermittently experiencing 'Cannot rename file' issue with Camel failing to backup to the 'done' folder successfully processed and transmitted files from the FTP source. Restarting the camel application fixes the issue.
Basically, at regular intervals triggered by a quartz scheduler, the route:
picks up files from a source via FTP,
processes them, smooks + xsl transformations
delivers the generated flat file to an endpoint via FTP.
If multiple files are read from the source directory, then all the files are appended together in a temporary file before being processed.
The Camel FTP configuration uses the following URL:
ftp://xxxx/export?antInclude=dsciord_*.dat&inProgressRepository=#warehouseIntegrationIdempotentRepository&preMove=in_progress_bpo/$simple{date:now:yyyyMMddHHmm}/$simple{file:name}&move=done&consumer.bridgeErrorHandler=true
read files dsciord_*.dat from /export directory
use custom inprogressRepository to store the read filename into a local db (this was done to prevent contention issue with a second cluster node, however, currently only a single node is live. This option is unnecessary and can be removed speeding up the process).
move files to an in_progress_bpo/201609061522 directory, where the subdirectory is created based on the date_timestamp.
move them to the in_progress_bpo/201609061522/done subdirectory once successfully processed.
In vast majority of cases the route works with no issues, however, sometimes the file(s) cannot be moved to the done folder (see error below). Even in this case, the route can sometimes continue successfully at the next polling cycle, however, in other cases the route enters a state when even if the quartz scheduler triggers the poll, the route fails to detect any files in the source /export directory even when there ARE files there.
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot rename file: RemoteFile[in_progress_bpo/201609060502/dsciord_3605752.dat] to: RemoteFile[in_progress_bpo/201609060502/done/dsciord_3605752.dat]
Notes: We are using
a single instance of a ConsumerTemplate to handle our interfaces.
a custom inprogressRepository to store the file names read.
Obviously, there must be a system locking the source files and this is causing the Camel route to stop processing further files.
Any ideas/suggestions on debugging/resolving this issue would be greatly appreciated. The issues that I read through the camel-users forum seem to deal with Windows-related deployments, sometimes Smooks failing to close the input stream. I've check and we don't use the
org.milyn.templating.xslt.XslTemplateProcessor#bypass method where Smooks fails to close the underlying input stream.
Finally I have been able to reproduce/identify the issue.
Given that we are using a relative path to move the processed files into once successfully ftp-ed to the destination servers:
../../../u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030/done
However, for some reason instead of traversing the via correct path to move the processed files the camel consumer creates a new subdirectory tree starting from the current working directory and this could be quite long as follows. Hence the problem. It doesn’t know where it is and it doesn’t reset itself.
/u/4gl_upload/warehouse_integration_2/trs-server/u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030
This was reproduced with the option stepwise=false, which means it traverses the subdirectories in a single step instead of step wise.
Still don’t know what best solution is.

Programmatically read queue.xml file

This question is related to this problem: Programmatically read a Queue's parameters
Is there a way to read queue.xml file's content programmatically on App Enigne? As far as I know all operations related to filesystem are prohibited on GAE.
The prohibited functions are related to the writing process in the file system (because does not exists in the sandbox) but the reading functions are available w/o problems.
the new File(); object set the root in you war folder (or webapp if Maven project), so you can open any file under that folder.
You can try to create new File("WEB-INF/queue.xml") and then read it with the common ways to read an xml

Apache Camel 2.10.7 - monitor deletion of files from file system

I am using camel 2.10.7 with great success from servicemix to feed files from the local file system to my application.
The files shall remain on the file system, hence I use a configuration like this one.
from uri="file:../ange-data/vessels?noop=true&idempotentKey=${file:name}-${file:modified}"
This works great if I touch/update a file on the file system.
Only issue remains: how can I then in my Java code detect that a file has been removed from the file system by some other person or process?
Could not find any hint by studying the manual pages http://camel.apache.org/file-language.html or http://camel.apache.org/file2.html - but I believe it should be possible to get a message on file deletion?
You would need to use Java 7 nio2 which has a file watcher api where you can get notifications when files are added/removed etc.
Search the web / SO for details on this api, for example
http://docs.oracle.com/javase/tutorial/essential/io/notification.html

Sharing file locks

I am currently working on a file processing service that looks at a fileshare, where files are uploaded to via FTP.
For scalability I've been asked to make this service to be able to be load balanced, so the service has to expect that other services on different machines may also be trying to process these files.
OK, so I thought I should be able achieve this by obtaining an exclusive lock for my process before processing a file, and skipping any files that may already be locked by another process.
The crux of this approach is shown below (I've left out the error handling for simplicity):
using(FileStream fs = File.Open(myFile, FileMode.Open, FileAccess.ReadWrite, (FileShare.Read | FileShare.Delete))
{
//Do work
}
Q1: My process now has a lock on this file. I thought this would mean I could then access the same file (without using the stream) and still have the correct access to it, but based on testing it seems I only have the benefits of the lock through the stream. Is this correct?
(For example, before I included FileShare.Delete, File.Delete(myFile) failed)
The above lock ultimately uses the 'Write' permission to determine which service has the file, but is intended to allow other processes to still Read the file. This is because the process that has the lock attempts to verify if the file is a valid zip file , which uses a third party library (Xceed.Zip). However this fails saying the file "is being used by another process". Using reflector I ultimately found the problematic call is:
stream = this.m_info.Open(FileMode.Open, FileAccess.Read, FileShare.Read);
Now I would have expected this to work as it only wants to read the file, but it fails. The reason appears to be outlined in a similar question. However, as this is a 3rd party API I can't change their code to use ReadWrite.
Q2: Is there a way I can correctly lock the file so it will not be picked up by the other services, but it can still be verified as a zip file using the external API?
I feel like there should be a 'correct' way to do this, but at the moment the best I can come up with is to lock the file, move it away from the shared directory, and then verify it at the new location.
If you're planning to reactively handle this situation by handling UnauthorizedAccessException I think you're making a serious mistake.
This can be handled by proactively renaming files. For example you can configure your service to only read files whose name is in the format 'Filename.YYYYMMDD.txt'. Prior to processing the file, you can rename it to 'Filename.YYYYMMDD.processing'. Then after processing the file you rename it to 'Filename.YYYYMMDD.done'.
You can even take it a step further by making another service that enqueues the filenames. This service will be a FileSystemWatcher that listens for FileAdd operations. Once it receives that event it proceeds to queueing the Filename to a global message queue. Then, each of your service will just be dequeueing filenames and no longer have to worry about concurrent access.
HTH

Apply file structure diff/patch on remote system?

Is there a tool that creates a diff of a file structure, perhaps based on an MD5 manifest. My goal is to send a package across the wire that contains new/updated files and a list of files to remove. It needs to copy over new/updated files and remove files that have been deleted on the source file structure?
You might try rsync. Depending on your needs, the command might be as simple as this:
rsync -az --del /path/to/master dup-site:/path/to/duplicate
Quoting from rsync's web site:
rsync is an open source utility that
provides fast incremental file
transfer. rsync is freely available
under the GNU General Public License
and is currently being maintained by
Wayne Davison.
Or, if you prefer wikipedia:
rsync is a software application for
Unix systems which synchronizes files
and directories from one location to
another while minimizing data transfer
using delta encoding when appropriate.
An important feature of rsync not
found in most similar
programs/protocols is that the
mirroring takes place with only one
transmission in each direction. rsync
can copy or display directory contents
and copy files, optionally using
compression and recursion.
#vfilby I'm the process of implementing something similar.
I've been using rsync for a while, but it gets funky when deploying to remote server with permission changes that are out of my control. With rsync you can choose to not include permissions, but they still endup being considered for some reason.
I'm now using git diff. This works very well for text files. Diff generates patches, rather then a MANIFEST that you have to include with your files. The nice thing about patches is that there is already an established framework for using and testing these patches before they're applied.
For example, with patch utility that comes standard on any *unix box, you can run the patch in dry-run mode. This will tell you if the patch that you're going to apply is actually going to apply before you run it. This helps you to make sure that the files that you're updating have not changed while you were preparing the patch.
If this is similar to what you're looking for, I can elaborate on my process.

Resources