Mule File endpoint download incomplete file without EOF - file

Mule flow must download big and huge XML files to process. When provider writes a file in input folder, in the same time File endpoint component can download incomplete file or provider cannot complete write process for some reasons. Does Mule API provide handling the situation?
Please advise.

The file connector doesn't download data, are you mixing up with the HTTP connector? Or do you mean the Mule file transport picks-up a file while it is still being copied by the file producer?
If the latter, the best option is to write to file into a temporary location then move it to the Mule pick-up folder because moving a file is an atomic operation.
Alternatively, you can use the fileAge attribute on the file connector to configure Mule to only pick-up files that are older than the specified age. This can work only if you have an idea of the maximum time it takes the file writer to write the file.

Related

Reading a file using Apache Camel from an FTP location

I have a requirement to read a file continuously from an FTP location and write to some topic(for eg:Kafka) using Apache Camel. I am able to read the file in the startup and write to the topic. Is there a way in Camel to read the file continuously from a folder whenever a new file come in to that location?

Apache Camel GenericFileOperationFailedException: 'Cannot rename file' locks exchange

We have an integration system based on Camel v2.16.1 that runs on a Jboss v6 Linux platform. There are multiple interfaces running simultaneously each with a different polling rate.
We are intermittently experiencing 'Cannot rename file' issue with Camel failing to backup to the 'done' folder successfully processed and transmitted files from the FTP source. Restarting the camel application fixes the issue.
Basically, at regular intervals triggered by a quartz scheduler, the route:
picks up files from a source via FTP,
processes them, smooks + xsl transformations
delivers the generated flat file to an endpoint via FTP.
If multiple files are read from the source directory, then all the files are appended together in a temporary file before being processed.
The Camel FTP configuration uses the following URL:
ftp://xxxx/export?antInclude=dsciord_*.dat&inProgressRepository=#warehouseIntegrationIdempotentRepository&preMove=in_progress_bpo/$simple{date:now:yyyyMMddHHmm}/$simple{file:name}&move=done&consumer.bridgeErrorHandler=true
read files dsciord_*.dat from /export directory
use custom inprogressRepository to store the read filename into a local db (this was done to prevent contention issue with a second cluster node, however, currently only a single node is live. This option is unnecessary and can be removed speeding up the process).
move files to an in_progress_bpo/201609061522 directory, where the subdirectory is created based on the date_timestamp.
move them to the in_progress_bpo/201609061522/done subdirectory once successfully processed.
In vast majority of cases the route works with no issues, however, sometimes the file(s) cannot be moved to the done folder (see error below). Even in this case, the route can sometimes continue successfully at the next polling cycle, however, in other cases the route enters a state when even if the quartz scheduler triggers the poll, the route fails to detect any files in the source /export directory even when there ARE files there.
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot rename file: RemoteFile[in_progress_bpo/201609060502/dsciord_3605752.dat] to: RemoteFile[in_progress_bpo/201609060502/done/dsciord_3605752.dat]
Notes: We are using
a single instance of a ConsumerTemplate to handle our interfaces.
a custom inprogressRepository to store the file names read.
Obviously, there must be a system locking the source files and this is causing the Camel route to stop processing further files.
Any ideas/suggestions on debugging/resolving this issue would be greatly appreciated. The issues that I read through the camel-users forum seem to deal with Windows-related deployments, sometimes Smooks failing to close the input stream. I've check and we don't use the
org.milyn.templating.xslt.XslTemplateProcessor#bypass method where Smooks fails to close the underlying input stream.
Finally I have been able to reproduce/identify the issue.
Given that we are using a relative path to move the processed files into once successfully ftp-ed to the destination servers:
../../../u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030/done
However, for some reason instead of traversing the via correct path to move the processed files the camel consumer creates a new subdirectory tree starting from the current working directory and this could be quite long as follows. Hence the problem. It doesn’t know where it is and it doesn’t reset itself.
/u/4gl_upload/warehouse_integration_2/trs-server/u/4gl_upload/warehouse_integration_2/trs-server/export/in_progress_bpo/201609081030
This was reproduced with the option stepwise=false, which means it traverses the subdirectories in a single step instead of step wise.
Still don’t know what best solution is.

Data loss on concurrent file write in camel

I am using camel technology for my file operation. My system is cluster environment.
Let say, I have 4 instances
Instance A
Instance B
Instance C
Instance D
Folders Structure
Input Folder: C:/app/input
Output Folder: C:/app/output
All the four instances will be pointing to Input folder location. As per, my business 8 files will be placed in the input folder and output will be consolidated file. here camel losing data when concurrently writing to output file.
Route:
from("file://C:/app/input")
.setHeader(Exchange.File_Name,simple("output.txt"))
.to("file://C:/app/output?fileExist=Append")
.end();
Kindly help me to resolve this issue. is there any thing like write lock in camel? to avoid concurrent file writer. Thanks in advance
You can use the doneFile option of the file component, see http://camel.apache.org/file2.html for more information.
Avoid reading files currently being written by another application
Beware the JDK File IO API is a bit limited in detecting whether another application is currently writing/copying a file. And the implementation can be different depending on OS platform as well. This could lead to that Camel thinks the file is not locked by another process and start consuming it. Therefore you have to do you own investigation what suites your environment. To help with this Camel provides different readLock options and doneFileName option that you can use. See also the section Consuming files from folders where others drop files directly.

Apache Camel 2.10.7 - monitor deletion of files from file system

I am using camel 2.10.7 with great success from servicemix to feed files from the local file system to my application.
The files shall remain on the file system, hence I use a configuration like this one.
from uri="file:../ange-data/vessels?noop=true&idempotentKey=${file:name}-${file:modified}"
This works great if I touch/update a file on the file system.
Only issue remains: how can I then in my Java code detect that a file has been removed from the file system by some other person or process?
Could not find any hint by studying the manual pages http://camel.apache.org/file-language.html or http://camel.apache.org/file2.html - but I believe it should be possible to get a message on file deletion?
You would need to use Java 7 nio2 which has a file watcher api where you can get notifications when files are added/removed etc.
Search the web / SO for details on this api, for example
http://docs.oracle.com/javase/tutorial/essential/io/notification.html

Mule file transfer not deleting source files

I am using Mule 3.2 and I am moving files from one location to another location. The error/problem is that Mule keeps on processing the same files again and again and do not deleted them.
The console displays:
org.mule.transport.file.FileMessageReceiver: Lock obtained on file:
My config file is below:
<flow name="File-FTP-Bridge">
<file:inbound-endpoint path="${outbound.input.path}"
moveToDirectory="${outbound.input.backup.path}">
<file:filename-wildcard-filter
pattern="*.msg" />
</file:inbound-endpoint>
<ftp:outbound-endpoint user="${outbound.ftp.user}"
password="${outbound.ftp.password}" host="${outbound.ftp.host}"
path="${outbound.ftp.path}" port="${outbound.ftp.port}"
outputPattern="#[header:originalFilename]">
</ftp:outbound-endpoint>
</flow>
I could not find the root cause for this problem. Thanks in advance.
Your file endpoint misses a pollingFrequency attributes, which means it uses the default of 1000ms. This makes Mule poll files way faster than the FTP endpoint can process them. Try for example:
pollingFrequency="10000"
If this is not good enough because the FTP upload has unpredictable performances (so Mule still retries a file that is being uploaded), then if your files are small enough to fit in memory, try adding:
<object-to-byte-array-transformer />
between your inbound and outbound endpoint. This loads the file in-memory and moves it right away to outbound.input.backup.path, before trying the FTP upload. Of course, if the FTP upload fails, you'll have to move the file back to outbound.input.path...

Resources