how to add a header to a large file in camel? - apache-camel

I have finally managed to split a large file and reaggregate into smaller (but still very large files)
At the end of the writing I have the count of records in each file. This needs to be added to each of the smaller files as a header.
What is the best way to accomplish this in a performant way ?
Possibilities I considered:
Write a file with each header as the split data files are being generated.
At the end match up the header and the data file and write concatenate it.
I am running into issues how to read the file in a non polling way and how to trigger the concat phase. This would require re-writing the entire big files
keep file headers in a message header or exchange and when all files are written, read all the files from the directory, find a matching header file and add it to output.
This would require re-writing the entire big files
Add a dummy header with placeholders for the count data and somehow modify the data files in place....
this seems most performant but not sure how to do this
Header: 3 records
a
b
c

Related

Append data to file and make sure it doesn't get corrupted

I have an existing file and I'd like to append data to it and make sure it can never (or almost never) get corrupted, even if something fails during writing of the appended data.
One method for ensuring files won't get corrupted it to write the data to a temp file, and then rename/mv the temp file to the original file.
But doing so with append is more tricky.
I have the whole file content in memory (it's not a huge file), so I have two options in mind:
Copy the original file to a temp file, append the data to the temp file and then mv/rename the temp file to the original file
Write the whole content of the file (including the data I want to append) to a temp file and then mv/rename the temp file to the original file
The downside of both options is that they're slower than just append the data to the original file. Are there better ways to do this?
If not, which option is faster?
I need this to work on Windows, Linux and MacOS.
I'm not sure if the programming language I'm using is relevant, but I'm using Rust to write the data.

camel aggregate lines and split into files of different sizes

My route read a file with a number of lines and filter some lines out.
It split the file on lines and filter and aggregate to a file.
The file uri is in append mode so each aggregation is appended to it. A done file is created everytime I write to it.
After the file is fully written to, another route picks up the file.
This route split the file into files of n files of equal number of records. But I am running into an issue where the done file is updated for every aggregation in step 1.
How do I update the done file only when the aggregation is fully done ?
I tried to use property ${exchangeProperty.CamelBatchComplete} in the route1.
But that property is always set to true on aggregation...
Its harder to help with just a bit confusing description of your use-case without some basic code example. However you can just write the done file yourself when you are done, its a few lines of Java code

SQL Server: copy headers after bat to extract data

I have many files that are extracted into .txt with a batch file. But they don't have the headers. I've read that a possible solution from here that is to add to a .txt with the headers the exported rows.
With this:
echo. >> titles.txt
type data.txt >> titles.txt
This takes a lot of time and is not efficient, since it is adding the big file to the file with the text.
Another possible solution is to add to the SQL query the titles hardcoded, but this will change the type of the columns (is they are numeric they will be changed to varchar).
Is there a way to insert in the first row of the data txt the headers and not doing vice-versa?
I might be wrong, but as far as I am informed (and as far as I know from earlier experiments in doing as described): No, it is not possible! The mentioned Tasks are acting on the file sequentially. You can either open a file for reading, writing or appending. If you open the titles.txt file for writing, it is overwritten - and with this empty. If you open it for appending, it can only append to the end of the file - so you can only write the data after the Header... the only way it might work - but which is pretty nasty - is to append the title to the end of the file and during later processing (e.g. xls or whatever) Resort the rows and put the last one to the beginning. But as mentioned: nasty and not really the way to go.
If the number of files to process is a bigger problem than any individual file size, switching from bcp to sqlcmd might help.

How to replace a line on the middle of a txt file in C?

I am reading info (numbers) from a txt file and after that I am adding to those numbers, others I had in another file, with the same structure.
At the start of each line in the file is a number, that identifies a specific product. That code will allow me to search for the same product in the other file. In my program I have to add the other "variables" from one file to the other, and then replace it, in the same place in one of those files.
I didn't open any of those files with a or a+, I did it with r and r+ because i want to replace the information in the lines that may be in the middle of the file, and not in the end of it.
The program compiles, and runs, but when it comes to replace the info in the file, it just doesn't do anything.
How should I resolve the problem?
A program can replace (overwrite) text in the middle of the file. But the question is whether or not this should be performed.
In order to insert larger text or smaller text (and close up the gap), a new text file must be written. This is assuming the file is not fixed width. The fundamental rule is to copy all original text before the insertion to a new file. Write the new text. Finally write the remaining original text. This is a lot of work and will slow down even the simplest programs.
I suggest you design your data layout before you go any further. Also consider using a database, see my post: At what point is it worth using a database?
Your objective is to design the data to minimize duplication and data fetching.

How do I insert data at the top of a CSV file?

How can I go back to the very beginning of a csv file and add rows?
(I'm printing to a CSV file from C using fprintf(). At the end of printing thousands of rows (5 columns) of data, I would like to go back to the top of the file and insert some dynamic header data (based on how things went printing everything). )
Thank You.
Due to the way files are structured, this is more or less impossible. In order to accomplish what you want:
write csv data to file1
write header to file2
copy contents of file1 to file2
delete file1
Or you can hold the csv data in ram and write it to file after you're finished processing and know the header.
Another option is to set aside a certain number of bytes for the header, which will work much faster for large files at minimal space cost. Since the space is allocated in the file at the start of the write, there aren't any issues going back and filling it in. Reopen the file as random access ("r+"), which points to the top of the file by default, write header, and close.
The simplest way would be to simply store the entire contents of the file in memory until you are finished, write out the header, and then write out the rest of the file.
If memory is an issue and you can't safely store the entire file in memory, or just don't want to, then you could write out the bulk of the CSV data to a temporary file, then when you are finished, write the header out to the primary file, and copy the data from the temporary file to the primary file in a loop.
If you wanted to be fancy, after writing the main CSV data out to the primary file, you could loop through the file from the beginning, read into memory the data that you're about to overwrite with the header, then write the header over top of that data, and so forth, read each chunk into memory, overwrite it with the previous one until you reach the end and append the final chunk. In this way you "insert" data at the beginning, my moving the rest of the file down. I really wouldn't recommend this as it will mostly just add complexity without much benefit, unless there is a specific reason you can't do something simpler like using a temporary file.
I think that is not possible. Probably the easiest way would be to write the output to a temporary file, then create the data you need as the dynamic header, write them to the target file and append the previously created temporary file.
write enough blank spaces in the first line
write data
seek(0)
write header - last column will be padded with spaces

Resources