Apache camel file with doneFileName - apache-camel

I am just starting to look at apache camel (Using blueprint routes) and I am already stuck.
I need to process a set of csv files with different formats. I get 5 files with foo_X_20160110.csv were X specifies the type of csv file and the files have a date stamp . These files can be quite large so a 'done' file is written once all files are written. The done file is named foo_trigger_20160110.csv.
I've seen the doneFileName option on file but that only supports a static name (I have a date in the filename) or it expects a done file for each input file.
The files have to be proceeded in a fixed order but it is not guaranteed in which order they are written to the input directory. Hence I need to wait for the done file.
Any idea how this can be done with Camel?
Any suggestions for good Camel books?

Here is an example from the documentation
http://camel.apache.org/file2.html
from("file:C:/temp/input.txt?doneFileName=done");
As you can see the doneFileName has a static value "done". But you can use standard java to write dynamic names i.e. for current dateformat or anything else and just use string operation to construct the URI. Hope that helps.
Update:
By the way, as mentioned in the documentation there is the option of dynamic placeholders for the doneFileName.
However its more common to have one done file per target file. This
means there is a 1:1 correlation. To do this you must use dynamic
placeholders in the doneFileName option. Currently Camel supports the
following two dynamic tokens: file:name and file:name.noext which must
be enclosed in ${ }. The consumer only supports the static part of the
done file name as either prefix or suffix (not both).
from("file:bar?doneFileName=${file:name}.done");
You can also use a prefix for the done file, such as:
from("file:bar?doneFileName=ready-${file:name}");

Related

JMeter: How to decode alphanumeric characters fetched in JMETER response to its valid value?

I have a JMeter test plan which basically downloads a file by breaking it into multiple parts.
However, these parts are received in encoded alphanumeric character format.
For instance, we have a .txt file which is broken down into 2 parts. Each part has an encoded set of characters. I have been successful so far in appending these characters into another file.
Is there a way of restoring the contents of this file ( holding alphanumeric characters) into the original .txt file with its valid contents back again?
e.g. JMeter response: <data> aWJiZWFuLFBhbmFtYSxDb3NtZXRpY </data>
Can someone please suggest the steps to achieve this?
It looks like it is Base64-encoded, you can use __base64Decode() function (can be installed as a part of Custom JMeter Functions bundle using JMeter Plugins Manager)
${__base64Decode(aWJiZWFuLFBhbmFtYSxDb3NtZXRpY,)}
If you don't have possibility or unwilling to use JMeter Plugins you can achieve the same using JMeter's built-in __groovy() function:
${__groovy(new String('aWJiZWFuLFBhbmFtYSxDb3NtZXRpY'.decodeBase64()),)}

camel aggregate lines and split into files of different sizes

My route read a file with a number of lines and filter some lines out.
It split the file on lines and filter and aggregate to a file.
The file uri is in append mode so each aggregation is appended to it. A done file is created everytime I write to it.
After the file is fully written to, another route picks up the file.
This route split the file into files of n files of equal number of records. But I am running into an issue where the done file is updated for every aggregation in step 1.
How do I update the done file only when the aggregation is fully done ?
I tried to use property ${exchangeProperty.CamelBatchComplete} in the route1.
But that property is always set to true on aggregation...
Its harder to help with just a bit confusing description of your use-case without some basic code example. However you can just write the done file yourself when you are done, its a few lines of Java code

fileName to retrieve multiple file names

How do I get apache camel to fetch multiple files using fileName on the file component. Here is my route:
sftp://ftpserver:22/?username=blah&password=blah&stepwise=false&useList=false&ignoreFileNotFoundOrPermissionError=true&fileName=${file:onlyname.noext}.txt&delete=true&doneFileName=done")
The files that are on the ftp server are 1.txt, 2.txt, 3.txt. How do I retrieve all of them without having to do a fixed fileName such as fileName=3.txt?
You can use the Simple expression language in the fileName to provide a name pattern. There are some examples on the older documentation page http://camel.apache.org/file-language.html
In your case it should be sufficient to use fileName=*.txt.
Use the include option such as:
include=.*\\.txt
include uses Java regular expression.
Alternatively, you may use antInclude:
antInclude=*.txt
EDIT:
antInclude may contain more than one file name:
antInclude=1.txt,2.txt
(At least this works for local files.)
EDIT:
If you are not permitted to list on the FTP server, then you must setup a route for every single file:
for (int i = 0; i < 10; i++) {
from("sftp://ftpserver:22/?fileName=${file:onlyname.noext}."+i+".txt")
...;
}
However, depending on the the number of files, this is not really something I would like to do..

Apache Camel - Copying a large file into a consumer folder

I have a route that expects that various files will be copied into an incoming folder. The route will proceed to move these files into a temp folder where it will do other stuff. The route is as follows:
<route id="incoming" >
<from uri="file://my/path/incoming"/>
<to uri="file://my/path/incoming/temp"/>
</route>
The issue is that these files may be quite large. Lets say 1Gb. In order to copy this file in to the incoming folder it may take lets say 10 seconds. During these 10 seconds the Consumer polls the directory and an exception is thrown since the partial file is still being copied. What workaround could I use?
I have used readLock all strategies (primarily changed) but I get an exception:
(The process cannot access the file because it is being used by another process)
The modified uri is as follows:
<from uri="file://my/file/path?readLockCheckInterval=3000&readLock=changed"/>
Still no luck though
Check the readLock options in the File component
Used by consumer, to only poll the files if it has exclusive read-lock on the file (i.e. the file is not in-progress or being written). Camel will wait until the file lock is granted.
This option provides the build in strategies:
markerFile Camel creates a marker file (fileName.camelLock) and then holds a lock on it.
changed is using file length/modification timestamp to detect whether the file is currently being copied or not. Will at least use 1 sec. to determine this, so this option cannot consume files as fast as the others, but can be more reliable as the JDK IO API cannot always determine whether a file is currently being used by another process. The option readLockCheckInterval can be used to set the check frequency.
fileLock is for using java.nio.channels.FileLock. This approach should be avoided when accessing a remote file system via a mount/share unless that file system supports distributed file locks.
rename is for using a try to rename the file as a test if we can get exclusive read-lock.
readLock=changed option seems appropriate in this case. There can be issues if you have a very slow producer writing files to the incoming folder.
Other option is to use the done file name. You can make the original producer create a done file after file write is completed.
its more common to have one done file per target file. This means there is
a 1:1 correlation. To do this you must use dynamic placeholders in the
doneFileName option. Currently Camel supports the following two
dynamic tokens: file:name and file:name.noext which must be enclosed
in ${ }. The consumer only supports the static part of the done file
name as either prefix or suffix (not both).
from("file:bar?doneFileName=${file:name}.done");
In this example onlyfiles will be polled if there exists a done file with the name file
name.done.
Something like this will work out. Just in case if its NON-Camel system is copying your large file into the InputDir, then you have to take care to create the .DONE file after the file is copied. Once the .DONE file is available the route will start processing.
from("file://" + InputDir + "?delay=500&doneFileName=${file:name}.DONE")
.to("file://" + OutputDir + "?fileName=${date:now:yyyyMMdd}/${file:name}&doneFileName=${file:name}.DATA.READY.DONE");
Probably this is late in the game but use fileExist=Append in the route URI. Example:
<route id="incoming" >
<from uri="file://my/path/incoming"/>
<to uri="file://my/path/incoming/temp?fileExist=Append"/>
</route>

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!
Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.
Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.
opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734
In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.
There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.
Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.
but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.
On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);
On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Resources