Camel File Component - Consumer files overwrite - apache-camel

what happen in Camel File Component - Consumer , if a file(1) is in process (so by documentation it is locked : By default the file is locked for the duration of the processing) , and another file(2) with same name is being save (ftp/non ftp action) into the file (1) directory ? will file (2) will overwrite the in process file (1) ?

It depends on the file system you are using, and what kind of locking you are using with Apache Camel. But yeah its always a bad thing to override a file, unless you are sure its okay.
And for FTP then there is less guarantee of locking as a FTP client cannot lock a remote FTP server file.
Study the java.io.File api on file locks, and also how file locks works on the file system you use.

Related

Requesting size of Stream from consumer, before loading all the data

Hy all,
I have a problem with the camel component I am developing, where I'm not sure how to implement it in a way, that goes in line with the concepts of camel.
The producer I'm developing talks to the http api for our server, which is used to send messages with attachments.
Those attachments can potentially be very big, which is why the server expects the total filesize before any upload is done.
Currently the producer only accepts io.Files, nio.Paths and GenericFile, because there I can read the file size, before I upload the file.
Of course this is not a good way to do things, because it requires the (big) file to be available locally.
Connecting, for a example, a ftp server as the consumer would mean, that I have to download each file locally so I can upload it afterwards.
The obvious solution is using streams to access and upload the data, but with this I do not know how big the file is, before I'm done uploading, which is not an option, I need the size in advance.
My question now is, what are best practices to stream files through camel and also make the consumer give me the filesize in advance.
Greets
Chris
For File/FTP consumer, the exchange in header has a key CamelFileLength (Exchange.FILE_LENGTH) which return the file size in remote ftp server from consumer's scan result.
Unlike the file size obtain from local, the file size in key CamelFileLength might differ from actual file size your application received
The ASCII mode will potentially change the linefeed when there is OS differ
The file size might change between consumer scan action and consumer pick actionn

Camel file reading: race condition with 2 active servers

In our ESB project, we have a lot of routes reading files with file2 or ftp protocol for further processing. Important to notice, that the files we read locally (file2 protocol) are mounted network shares via different protocols (NFS, SMB).
Now, we are facing issues with race conditions. Both servers read the file and process it. We have reduced the possibility of that by using the preMove option, but from time to time the duplicate reading still occurs when both servers poll at the same millisecond. According to the documentation, an idempotentRepository together with readLock=idempotent could help, for example with HazelCast.
However, I'm wondering if this is a suitable solution for my issue as I don't really know if it will work in all cases. It is within milliseconds that both servers read the file, so the information that one server has already processed the file need to be available in the HazelCast grid at the point in time when the second server tries to read. Is that possible? What happens if there are minimal latencies (e.g. network related)?
In addition to that, the setting readLock=idempotent is only available for file2 but not for ftp. How to solve that issue there?
Again: The issue is not preventing dublicate files in general, it is solely about preventing the race condition.
AFAIK the idempotent repository should prevent in your case that both consumers read the same file.
The latency between detection of the file and the entry in hazelcast is not relevant because the file consumers do not enter what they read. Instead they both ask the repository for an exclusive read-lock. The first one wins, the second one is denied, so it continues to the next file.
If you want to minimize the potential of conflicts between the consumers you can turn on shuffle=true to randomize the ordering of files to consume.
For the problem with the missing readLock=idempotent on the ftp consumer: you could perhaps build a separate transfer-route with only 1 consumer that downloads the files. Then your file-consumer route can process them idempotent.

validity of file descriptors of device files

I have an app which opens device files of harddisks. /dev/sda or something like that.
Now lets say my app opens the disk and in between any work that is done to the disk, I disconnect the disk and reconnect a different disk which again is the device file /dev/sda.
Is the file descripter still valid or does linux know it is a different disk and fail operations on that file descriptor accordingly?
A good way to deal with this would be to write a udev rule so that a particular hard disk with a particular vendor ID is mounted in a certain way, that way you would be certain that the File descriptor would fail if you unplugged one hard disk and reconnected another.

Apache Camel File Transfer for previous interrupted process

I have two camel applications and their duty is to read files from the same directory, process them and send them to db consumer. To do this, my endpoint are like this:
file:/data/air?preMove=thread&readLock=fileLock &idempotent=true&idempotentRepository=#fileStore&include=AIROUTPUTCDR_(.*).AIR.gz&move=/data/air/success&moveFailed=error
As u can see, application polls file from polldir based on filters, move them under thread dir to read, read the file and move to success folder.
But with this flow, if I kill an application and start it again, the files ,which were being processed, will not be processed because they are under threads folder.
My question is, is there a way to resume reading the files which are just interrupted?
Thanks
No if you do a hard kill on the application while a file was pre moved, then you would neeed manually to move these files from pre move, back into the source folder, so they can be picked up again

Apache Camel: How do I signal to other processes that I moved/renamed a file?

I am trying to develop a file receipt process using Camel. what I am trying to do seems simple enough:
receive a file
invoke a web service which will look at that file and generate some metadata
move the file to a new location based on that metadata
invoke subsequent process(es) which will act on the file in it's new location
I have tried several different approaches but none seem to work exactly as I would like. My main issues are that since the file is not moved/renamed until the route is completed, I cannot signal to any downstream process that the file is available within that route.
I need to invoke webservices in order to determine the new name and location, once I do that the body is changed and I cannot use a file producer to move the file from within the route.
I would really appreciate hearing any other solutions.
You can signal the processing routes and then have them poll using the doneFile functionality of the file component.
Your first process will copy the files, signal the processing routes, and when it is done copying the file it will write a done file. Once the done file has been written the file consumers in your processing routes will pick up the file you want to process. This guarantees that the file is written before it is processed.
Check out the "Using done files" section of the file component.
http://camel.apache.org/file2.html
Using other components you could have used the OnCompletion DSL-syntax to trigger a post-route message for futher processing.
However, with the file component, this is not really doable, since the move/done thingy happends in parallell with that "OnCompletion" trigger, and you can't be sure that the file is really done.
You might have some luck with the Unit of Work API which can register post route execution logic (this is how the File component fires of the move when the route is done).
However, do you really need this logic?
I see that you might want to send a wakeup call to some file consumer, but do the file really have to be ready that very millisec? Can't you just start to poll for the file and grab it once ready, once you received the trigger message? That's ususually how you do things with file based protocols (or just ignore the trigger and poll once every now and then).

Resources