How to process only the last file in a directory using Apache Camel's file component - apache-camel

I have a directory with files likes this:
inbox/
data.20130813T1921.json
data.20130818T0123.json
data.20130901T1342.json
I'm using Apache Camel 2.11 and on process start, I only want to process one file: the latest. The other files can actually be ignored. Alternatively, the older files can be deleted once a new file has been processed.
I'm configuring my component using the following, but it obviously doesn't do what I need:
file:inbox/?noop=true
noop does keep the last file, but also all other files. On startup, Camel processes all existing files, which is more than I need.
What is the best way to only process the latest file?

You can use the sorting and then sort by name, and possible need to reverse it so the latest is first / last. You can try it out to see which one you need. And then set maxMessagesPerPoll=1 to only pickup one file. And you need to set eagerMaxMessagesPerPoll=false to allow to sort before limiting the number of files.
You can find details at: http://camel.apache.org/file2. See the section Sorting using sortBy for the sorting.

An alternative would be to still using the sorting to ensure the latest file is last. Then you can use the aggregator EIP to aggregate all the files, and use org.apache.camel.processor.aggregate.UseLatestAggregationStrategy as the aggregation strategy to only keep the last (which would be the latest file). Then you can instruct the file endpoint to delete=true to delete the files when done. You would then also need to configure the aggregator to completionFromBatchConsumer=true.
The aggregator eip is documented here: http://camel.apache.org/aggregator2

Related

Apache Camel doneFileName with changing name

I'm currently creating some route and for one of them I have a problem.
Usually I have a data file and then a done file which have the same name prefixed by "ACK" and this works perfectly with camel and the doneFileName option.
But for one of my route I have to work with a different situation, I still receive two files but they have the same typology, it's like: MyFILE-{{timestamp}}. The data file contains the data, and the done file contains just "done".
So I need something to check the content of the file, and if it's juste "done" then process the other file.
Is there a way to handle this with camel?
The most pragmatic solution I see is to write an "adapter script" (bash or whatever you have at your disposal) that peeks into every file with a timestamp in its name.
If the file content is "done":
Lookup the other "MyFILE-{{timestamp}}" (the data file) and rename it to "MyFILE"
Rename the done file to "MyFILE.done"
Camel can then import the data file using the standard done-file-option. Because both files are renamed to something without a timestamp, the peek-script ignores them after renaming.

How to rename a file while using "move" in URL in apache camel

I have an URL like
url = "file:D:/inputFolder?move=D:/outputFolder". we are making this url dynamically.
I want to rename the file while moving, So I made it something like this
url = "file:D:/inputFolder?move=D:/outputFolder&fileName=abc.txt". But I think move and fileName do not work together, it is not renaming.
Is there any alternative to do it? Please remember I want with "move" only.
I cannot use .setHeader(..) also.
Thanks,
Hy,
as far as I understand you, your trying to move the file in one single uri.
That is not really how camel works.
The idea of camel is to have a "consumer" and a "producer", where the consumer loads data (e.g. your file) and the producer puts the data somewhere (e.g. save the file into a folder)
That being said, here is what worked for me with a java route:
from("file:/home/chris/temp/camel/in")
.to("file:/home/chris/temp/camel/out/?fileName=test.txt");
The from part configures the folder where camel looks for new files. A few notes on that:
The file component checks the folder each 0.5 sec for new files. This can be changed with the delay parameter
The option noop configures, if the file is being moved or copied. By default it is set to false, which means it is moved
In the to part you configure, where the file is supposed to be moved. Here you can use the fileName parameter to rename the file.
Be careful with this though, because setting an option in the uri directly does make it "static".
What I mean by that is, that the only way of changing the parameter is by completely reconfiguring the route or by restarting it, where neither is something you would want to do normally.
Note 1:
Moving all files that are put into one folder into the same file always overrides the previous file by default.
You could, for example, use the fileExists parameter to always just append the content of the file: fileExists=Append (See camel file docu for details)
Note 2:
There is an option in the file component to not "move" the file, but copy, rename and delete it, which sometimes is necessary, when you want to move it onto a different drive and a simple copy does not work.
Also see the docu for the camel file component for details on that.
Note 3:
You can have multiple to() statements in the same route to have the file moved to multiple locations. For example:
from("file:/home/chris/temp/camel/in")
.to("file:/home/chris/temp/camel/out/?fileName=test.txt")
.to("smtp:....");
Hope I could help you and answer you question.
Greets
Chris
Two possible ways to achieve your goal.
Use both "consumer" and "producer"
Using this way, you are free to control where and how your destination can be set and has great freedom to control filename with the use of a processor/bean.
from("file:D:/inputFolder")
.to("file:D:/outputFolder?fileName=abc.txt")
Use "consumer" only
Using this way, you are treating your work as source data control. This can be use when your file is going to move within same drive. The drawback is the filename rename pattern is limited (refer to camel file language)
from("file:D:/inputFolder?move=${file:parent}/../outputFolder/abc.txt")

Get the latest/recent file from FTP to local using Talend

I have to create a job in Talend which will connect to One FTP. The FTP is having various files for each day with same prefix but different timestamp appended(yyyymmddhhmmss) in the filename.
Example -
MyFile20151123142020.xml
MyFile20151123154748.xml
My requirement is to pick the latest or most recent file and copy to my local.
I understand that this could be achieved either by referring to the latest timestamp in the filename or referring to the last modified time. I thought of proceeding with the later and my job looks like below -
I dont know how to proceed further and how to use the latest mtime value to pick the most recent file.
After getting file properties, wen need to sort files by mtime or by basename then pick the first.
tSortRow : sort by mtime or basename if they have same pattern.
tSampleRow : "1" to get the first
tFTPGet : file mask = row3.basename (row3 the output flow of tSampleRow)

Merge all files from a branch into main(using a command/script) cleartool

I want to merge all files from a particular branch to main using a script or a command. Is there a way to do it without checking out each file in target or should i do each file manually.
ClearCase is file-based, not repository-based: any merge will be done file by file, and actually first folder by folder (you merge the folders first, then the files).
The easiest way to initiate a merge based on a branch is to use the cleartool findmerge, which can use a version selector, like -fve/rsion .../branch1/LATEST.
See also "To prepare to merge"
The usual approach though is to use a view or a tag to select the elements you want to merge, as I described in "How merge sub branch to main branch using clearcase command line under linux?" (using -ftag).
Note that this works also in an UCM environment, based on activities (even though the deliver and rebase commands remain the recommended merge methods).

Clearcase: how to copy/fork a file?

In Clearcase, I want to copy (fork, split) a file while preserving its history. Something like svn cp old.txt new.txt. How do I do it?
It isn't possible do fork a file in ClearCase.
If you refactor your code and split a file in two, one of them will appear as a new file and you will loose the information about who coded it. The annotate command will say the author of the lines are who splited it.
UCM or not, you cannot duplicate easily the full history of a file.
The best way to isolate an history is still to create a branch in order to make new versions to that file without impacting the same file in the original branch.
Thinking 'svn cp' should be available in ClearCase might come from the fact that, in SVN, branches are directories, and a tool like cc2svn will actually replicate ClearCase branches using 'svn cp'.
But since, with ClearCase, branches are first-class citizen, it is best to reason in term of branch than in term of copy/fork.
From the main page of cc2svn:
There is a difference in creating the branches in ClearCase and SVN:
SVN copies all files from parent branch to the target like: svn cp branches/main branches/dev_branch
ClearCase creates the actual branch for file upon checkout operation only.
Pretty simply done
Check out parent folder
Move element you wish to duplicate to appropriate location (not within the checked out parent folder)
Undo Checkout of parent folder
All the files get returned to the original folder with history and also the duplicate ones remain in the new location with the history too. Now each file can be checked out and changed individually

Resources