How do I rename a file with custom pattern in camel? - apache-camel

I am splitting a large file and need to generate filenames like
file.x.y.txt
x is the file split number and y is the total number of splits.
The files are already generated as file.x.txt through some complicated process.
I have this so far:
from("file://out?include=*.txt&move=${file:name}.<how to set this>.txt")
I could not figure out how to pass this to the consumer the y number.

Related

Loop over files of same length and add their column respectively

I have a directory that contains Nfiles text files called myfile_i.txt (with i and integer).
Each text file contains 2 columns (let's call them x and y) and Nsteps rows of floating numbers.
I would like to loop over all the files in the directory and do the following computation in C:
while [condition to go through all files called myfile_i.txt in the directory]
{
for (ii= istart;ii<Nsteps; i++) // with istart>=0 and istart<Nsteps otherwise error
{
dsq[ii]=(x[ii]-x[istart])*(x[ii]-x[istart])+(y[ii]-y[istart])*(y[ii]-y[istart]);
}
}
where istart is an input parameter (integer) that I choose as a starting point in terms of iteration (or if you will, time steps).
Then at the end I would need to devide this result (which I believe it a column vector) by the number of steps (basically, Nsteps-istart).
I used to do something like this in Matlab but I have lots of these large files and loading them all and computing this is too slow for my computer, this is why I want to use (and learn) C.
What should my program look like in order to accomplish this computation?

camel aggregate lines and split into files of different sizes

My route read a file with a number of lines and filter some lines out.
It split the file on lines and filter and aggregate to a file.
The file uri is in append mode so each aggregation is appended to it. A done file is created everytime I write to it.
After the file is fully written to, another route picks up the file.
This route split the file into files of n files of equal number of records. But I am running into an issue where the done file is updated for every aggregation in step 1.
How do I update the done file only when the aggregation is fully done ?
I tried to use property ${exchangeProperty.CamelBatchComplete} in the route1.
But that property is always set to true on aggregation...
Its harder to help with just a bit confusing description of your use-case without some basic code example. However you can just write the done file yourself when you are done, its a few lines of Java code

Reading many (1000+) files with dlmread - Loop with varying filenames?

I'm very new to matlab, or coding for that matter.
I'm running a simulation which outputs thousands of files. These files are .vtk and are read correctly by dlmread.
I tried reading one of them, defining it as a matrix and extracting column vectors out of this matrix. This works fine. What i need now is to not only read one of them, but all. The filenames vary by a number, for example cover1000.vtk, cover2000.vtk, ...., cover1200000.vtk.
I want all of them to be read with dlmread and stored as a different matrix. How do i do that? Here is what i have right now, working with one file at a time:
A_1000 = dlmread ('cover1000.vtk') %matrix a containing values from vtk file_in_loadpath
fx_1000 = A(1:20,1) %extracting vector with specific values
fx_ave_1000 = sum(fx_1000)/length(fx_1000) % average of the values in extracted vector
I'm thinking of a loop, but how do i create a loop with varying file names?
Also I've read that a loop is not the best idea, cell arrays would be better. But i have absolutely no idea how to implement any of this.
Thanks for your help!
cheers
You can use the function dir to list all the vtk files in your directory then loop over those files.
filename = dir('*.vtk'); %list all the vtk files in your current directory.
for ii = 1:length(filename)
A = dlmread (filename(ii).name) %matrix a containing values from vtk file_in_loadpath
fx{ii} = A(1:20,1) %extracting vector with specific value
fx_ave{ii} = sum(fx{ii})/length(fx{ii}) % average of the values in extracted vector
end
The results are now stored in two cells: fx and fx_ave.

Write output in different files for different input files using mapreduce

How to write output in different files for different input files using mapreduce for example
suppose i want to calculate term frequency of terms per file from video.txt and outlier.txt , store result in video1.txt and oulier1.txt respectively?
In you mapper append the filename to each word you find. Your key would then be 'word+filename'. Make sure that your partitioner uses the 'filename' for partitioning so that all words from the same file will end up with the same reducer

Concatenate a large number of HDF5 files

I have about 500 HDF5 files each of about 1.5 GB.
Each of the files has the same exact structure, which is 7 compound (int,double,double) datasets and variable number of samples.
Now I want to concatenate all this files by concatenating each of the datasets so that at the end I have a single 750 GB file with my 7 datasets.
Currently I am running a h5py script which:
creates a HDF5 file with the right datasets of unlimited max
open in sequence all the files
check what is the number of samples (as it is variable)
resize the global file
append the data
this obviously takes many hours,
would you have a suggestion about improving this?
I am working on a cluster, so I could use HDF5 in parallel, but I am not good enough in C programming to implement something myself, I would need a tool already written.
I found that most of the time was spent in resizing the file, as I was resizing at each step, so I am now first going trough all my files and get their length (it is variable).
Then I create the global h5file setting the total length to the sum of all the files.
Only after this phase I fill the h5file with the data from all the small files.
now it takes about 10 seconds for each file, so it should take less than 2 hours, while before it was taking much more.
I get that answering this earns me a necro badge - but things have improved for me in this area recently.
In Julia this takes a few seconds.
Create a txt file that lists all the hdf5 file paths (you can use bash to do this in one go if there are lots)
In a loop read each line of txt file and use label$i = h5read(original_filepath$i, "/label")
concat all the labels label = [label label$i]
Then just write: h5write(data_file_path, "/label", label)
Same can be done if you have groups or more complicated hdf5 files.
Ashley's answer worked well for me. Here is an implementation of her suggestion in Julia:
Make text file listing the files to concatenate in bash:
ls -rt $somedirectory/$somerootfilename-*.hdf5 >> listofHDF5files.txt
Write a julia script to concatenate multiple files into one file:
# concatenate_HDF5.jl
using HDF5
inputfilepath=ARGS[1]
outputfilepath=ARGS[2]
f = open(inputfilepath)
firstit=true
data=[]
for line in eachline(f)
r = strip(line, ['\n'])
print(r,"\n")
datai = h5read(r, "/data")
if (firstit)
data=datai
firstit=false
else
data=cat(4,data, datai) #In this case concatenating on 4th dimension
end
end
h5write(outputfilepath, "/data", data)
Then execute the script file above using:
julia concatenate_HDF5.jl listofHDF5files.txt final_concatenated_HDF5.hdf5

Resources