How to modify a single file inside a very large zip without re-writing the entire zip? - c

I have large zip files that contain huge files. There are "metadata" text files within the zip archives that need to be modified. However, it is not possible to extract the entire zip and re-compress it. I need to locate the target text file inside the zip, edit it, and possibly append the change to the zip file. The file name of the text file is always the same, so it can be hard-coded. Is this possible? Is there a better way?

There are two approaches. First, if you're just trying to avoid recompression of the entire zip file, you can use any existing zip utility to update a single file in the archive. This will entail effectively copying the entire archive and creating a new one with the replaced entry, then deleting the old zip file. This will not recompress the data not being replaced, so it should be relatively fast. At least, about the same time required to copy the zip archive.
If you want to avoid copying the entire zip file, then you can effectively delete the entry you want to replace by changing the name within the local and central headers in the zip file (keeping the name the same length) to a name that you won't use otherwise and that indicates that the file should be ignored. E.g. replacing the first character of the name with a tilde. Then you can append a new entry with the updated text file. This requires rewriting the central directory at the end of the zip file, which is pretty small.
(A suggestion in another answer to not refer to the unwanted entry in the central directory will not necessarily work, depending on the utility being used to read the zip file. Some utilities will read the local headers for the zip file entry information, and ignore the central directory. Other utilities will do the opposite. So the local and central entry information should be kept in sync.)

There are "metadata" text files within the zip archives that need to be modified.
However, it is not possible to extract the entire zip and re-compress it.
This is a good lesson why, when dealing with huge datasets, keeping the metadata in the same place with the data is a bad idea.
The .zip file format isn't particularly complicated, and it is definitely possible to replace something inside it. The problem is that the size of the new data might increase, and not fit anymore into the location of the old data. Thus there is no standard routine or tool to accomplish that.
If you are skilled enough, theoretically, you can create your own zip handling functions, to provide the "file replace" routine. If it is about the (smallish) metadata, you do not even need to compress them. The .zip's "central directory" is located in the end of the file, after the compressed data (the format was optimized for appending new files). General concept is: read the "central directory" into the memory, append the new modified file after the compressed data, update the central directory in memory with the new file offset of the modified file, and write the central directory back after the modified file. (The old file would be still sitting somewhere inside the .zip, but not referenced anymore by the "central directory".) All the operations would be happening at the end of the file, without touching the rest of the archive's content.
But practically speaking, I would recommend to simply keep the data and the metadata separately.

Related

Update file across multiple folder locations?

I need something that can copy a specified file any and everywhere on my drive (or computer) where that file already exists; i.e. update a file. I tried to search this site, in case I'm not the first, and found this:
CMD command line: copy file to multiple locations at the same time
But not quite the same.
Example:
Say I have a file called CurrentList.txt, and I have copies of it all over my hard drive.  But then I change it and I want all the copies to update.  So I want to copy the newer one over all the others.  It could 'copy if newer', but generally I know it's newer, so it could also just find every instance and copy over it.
I was originally going to use some kind of .bat file that would have to iterate over every folder seeking the file in question, but my batch file programming is limited/rusty.  Then I looked to see if xcopy could do it, but I don't think so...
For how I will use it most, I generally know where those files are going to be, so it actually might be as good or better if I could specify it to (using example), "copy CurrentList.txt, overwriting all other copies wherever found in the C:\Lists folder and all subfolders".
I would really like to be able to have it in a context menu, so I could (from a file explorer) right click on a file or selected files and choose the option to distribute it.
Thanks in advance for any ideas.
Use the "replace" command...
replace CurrentList.txt C:\Lists /s

how to get the type of the file before its compression

For example, if we have the following file: file.txt that after the compression is now file.new (new is the new extension) , how to obtain that .txt extension, that is forgotten?
I need that to decompress the file.
In general, if you lose the file name extension you can't get it back. It's as simple as this.
However, there might be chances depending on the compression format. Some formats do store the original file name (along with other informations) in the compressed file. And the "decompressor" will be able to recreate those properties.
Anyway, it's good practise to name a compressed file with an additional extension, in your case file.txt.new.
Oh, and you don't need to know the file name extension to uncompress the compressed file. Just uncompress it and give it a temporary name. As #MarcoBonelli said, file contents and file name extensions have no fixed relation. They are just a convention to handle them conveniently.
For example: You can rename a EXE to DOCX. Windows will show the Word icon but it is still an executable. Windows will not attempt to run it, though.
To know what a file contains can be difficult. The magic number Marco linked to might give you some hint.

How would I store different types of data in one file

I need to store data in a file in this format
word, audio, jpeg
How would I store that all in one file? Is it even possible do would I need to store links to other data files in place of the audio and jpeg. Would I need a custom file format?
1. Your own filetype
As mentioned by #Ken White you would need to be creating your own custom file format for this sort of thing, which would then mean creating your own parser type. This could be achieved in almost any language you wanted but since you are planning on using word format, then maybe C# would be best for you. However, this technique could be quite complicated and take a relatively large amount of time to thoroughly test your file compresser / decompressor, but may be best depending on your needs.
2. Command line utilities
Another way to go about this would be to use a bash script to combine all of the files into one file, and then decompress it at the other end. For example the steps could involve:
Combine files using windows copy / linux cat command on command line
Create a metdata file of your own that says how many files are in this custom file, and how much memory each one takes up (could be a short XML or JSON file for example...)
Use the linux split command or install a Windows command line file splitter program (here's just one example) to split the file back into whatever components have made it up.
This way you only have to create a really small file type, and let the OS utilities handle the combining of them for you.
Example on Windows:
Copy all of the files in your current directory into one output file called 'file.custom'
copy /b * file.custom
Generate custom file format describing metadata (i.e. get the file size on disk in C# example here). This is just maybe what I would do in JSON. SO formatting was being annoying so here's a link (Copy paste it into an editor or online JSON viewer).
Use a decompress windows / linux command line tool to decompress each files to the exact length (and export it back to the exact name) specified in the JSON (metadata) file. (More info on splitting files on this post).
3. ZIP files
You could always store all of the files in a compressed zip file, and then just use a zip compressor, expander as and when you like to retreive any number of file formats stored within.
I found a couple of examples of :
Combining multiple files into one ZIP file in only C# .net,
Unzipping ZIP files in C#
Zipping & Unzipping with only windows built-in utilities
Zipping & Unzipping in Linux command line
Good Zipping/Unzipping library in Java
Zipping/Unzipping in Python

cmd- Copying file content into specific lines of an existing file

I'm trying to make an batch file that will copy the contents of a .cfg file into another .cfg file. The problem I'm having is that I want the contents of the first file to be placed at specific lines of the destination file, for example, placing the contents between line 300 and 343 and overwriting the original content within those lines.
Any way of doing this?
If there isn't a way to detect specific lines maybe there is a way to detect a specific string, like an ID?
If you are allowed to use 3rd party tools in your environment you can use a regex CLI tool to find and then replace the lines / values you need. The tool can be called using batch scripts.
Example Tools from another question:
https://superuser.com/questions/339118/regex-replace-from-command-line

Using a batch file to copy multiple files with the same name, and past into new folder with differing names

I have a long list of files that are auto-produced every month. they'll all have the same file name, with a sequential file extension like this:file.001, file.002, file.003
Each file has differing information, despite having the same name. What I need to do is copy them from their home directory and paste them into a new directory with names that reflect their purpose, and as text files, like this: Budget.txt, Expense.txt, Retention.txt
Is it possible to do this with a batch file? I've been unable to find a method that works. Any help would be appreciated.
EDIT: I've tried that solution, and it works as far as it goes. the frustrating thing is that the extensions are not always the same, but always sequentially numbered.

Resources