camel aggregate lines and split into files of different sizes - apache-camel

My route read a file with a number of lines and filter some lines out.
It split the file on lines and filter and aggregate to a file.
The file uri is in append mode so each aggregation is appended to it. A done file is created everytime I write to it.
After the file is fully written to, another route picks up the file.
This route split the file into files of n files of equal number of records. But I am running into an issue where the done file is updated for every aggregation in step 1.
How do I update the done file only when the aggregation is fully done ?
I tried to use property ${exchangeProperty.CamelBatchComplete} in the route1.
But that property is always set to true on aggregation...

Its harder to help with just a bit confusing description of your use-case without some basic code example. However you can just write the done file yourself when you are done, its a few lines of Java code

Related

how to add a header to a large file in camel?

I have finally managed to split a large file and reaggregate into smaller (but still very large files)
At the end of the writing I have the count of records in each file. This needs to be added to each of the smaller files as a header.
What is the best way to accomplish this in a performant way ?
Possibilities I considered:
Write a file with each header as the split data files are being generated.
At the end match up the header and the data file and write concatenate it.
I am running into issues how to read the file in a non polling way and how to trigger the concat phase. This would require re-writing the entire big files
keep file headers in a message header or exchange and when all files are written, read all the files from the directory, find a matching header file and add it to output.
This would require re-writing the entire big files
Add a dummy header with placeholders for the count data and somehow modify the data files in place....
this seems most performant but not sure how to do this
Header: 3 records
a
b
c

SQL Server: copy headers after bat to extract data

I have many files that are extracted into .txt with a batch file. But they don't have the headers. I've read that a possible solution from here that is to add to a .txt with the headers the exported rows.
With this:
echo. >> titles.txt
type data.txt >> titles.txt
This takes a lot of time and is not efficient, since it is adding the big file to the file with the text.
Another possible solution is to add to the SQL query the titles hardcoded, but this will change the type of the columns (is they are numeric they will be changed to varchar).
Is there a way to insert in the first row of the data txt the headers and not doing vice-versa?
I might be wrong, but as far as I am informed (and as far as I know from earlier experiments in doing as described): No, it is not possible! The mentioned Tasks are acting on the file sequentially. You can either open a file for reading, writing or appending. If you open the titles.txt file for writing, it is overwritten - and with this empty. If you open it for appending, it can only append to the end of the file - so you can only write the data after the Header... the only way it might work - but which is pretty nasty - is to append the title to the end of the file and during later processing (e.g. xls or whatever) Resort the rows and put the last one to the beginning. But as mentioned: nasty and not really the way to go.
If the number of files to process is a bigger problem than any individual file size, switching from bcp to sqlcmd might help.

Apache camel file with doneFileName

I am just starting to look at apache camel (Using blueprint routes) and I am already stuck.
I need to process a set of csv files with different formats. I get 5 files with foo_X_20160110.csv were X specifies the type of csv file and the files have a date stamp . These files can be quite large so a 'done' file is written once all files are written. The done file is named foo_trigger_20160110.csv.
I've seen the doneFileName option on file but that only supports a static name (I have a date in the filename) or it expects a done file for each input file.
The files have to be proceeded in a fixed order but it is not guaranteed in which order they are written to the input directory. Hence I need to wait for the done file.
Any idea how this can be done with Camel?
Any suggestions for good Camel books?
Here is an example from the documentation
http://camel.apache.org/file2.html
from("file:C:/temp/input.txt?doneFileName=done");
As you can see the doneFileName has a static value "done". But you can use standard java to write dynamic names i.e. for current dateformat or anything else and just use string operation to construct the URI. Hope that helps.
Update:
By the way, as mentioned in the documentation there is the option of dynamic placeholders for the doneFileName.
However its more common to have one done file per target file. This
means there is a 1:1 correlation. To do this you must use dynamic
placeholders in the doneFileName option. Currently Camel supports the
following two dynamic tokens: file:name and file:name.noext which must
be enclosed in ${ }. The consumer only supports the static part of the
done file name as either prefix or suffix (not both).
from("file:bar?doneFileName=${file:name}.done");
You can also use a prefix for the done file, such as:
from("file:bar?doneFileName=ready-${file:name}");

identifying data file type

I have a huge 1.9 GB data file without extension I need to open and get some data from, the problem is this data file is extension-less and I need to know what extension it should be and what software I can open it with to view the data in a table.
here is the picture :
Its only 2 lines file, I already tried csv on excel but it did not work, any help ?
I have never use it but you could try this:
http://mark0.net/soft-tridnet-e.html
explained here:
http://www.labnol.org/software/unknown-file-extensions/20568/
The third "column" of that line looks 99% chance to be from php's print_r function (with newlines imploded to be able to stored on a single line).
There may not be a "format" or program to open it with if its just some app's custom debug/output log.
A quick google found a few programs to split large files into smaller units. THat may make it easier to load into something (may or may not be n++) for reading.
It shouldnt be too hard to mash out a script to read the lines and reconstitute the session "array" into a more readable format (read: vertical, not inline), but it would to be a one-off custom job, since noone other than the holder of your file would have a use for it.

How to replace a line on the middle of a txt file in C?

I am reading info (numbers) from a txt file and after that I am adding to those numbers, others I had in another file, with the same structure.
At the start of each line in the file is a number, that identifies a specific product. That code will allow me to search for the same product in the other file. In my program I have to add the other "variables" from one file to the other, and then replace it, in the same place in one of those files.
I didn't open any of those files with a or a+, I did it with r and r+ because i want to replace the information in the lines that may be in the middle of the file, and not in the end of it.
The program compiles, and runs, but when it comes to replace the info in the file, it just doesn't do anything.
How should I resolve the problem?
A program can replace (overwrite) text in the middle of the file. But the question is whether or not this should be performed.
In order to insert larger text or smaller text (and close up the gap), a new text file must be written. This is assuming the file is not fixed width. The fundamental rule is to copy all original text before the insertion to a new file. Write the new text. Finally write the remaining original text. This is a lot of work and will slow down even the simplest programs.
I suggest you design your data layout before you go any further. Also consider using a database, see my post: At what point is it worth using a database?
Your objective is to design the data to minimize duplication and data fetching.

Resources