ZIP file specification encryption header - file

I am reading the zip file specification and there is no explanation as to how the encryption header for a file in the archive is structured. The header order is like this:
[local file header 1]
[encryption header 1]
[file data 1]
[data descriptor 1]
After the local file header the specification says the following while skipping the encryption header part:
Immediately following the local header for a file
SHOULD be placed the compressed or stored data for the file.
If the file is encrypted, the encryption header for the file
SHOULD be placed after the local header and before the file
data. The series of [local file header][encryption header]
[file data][data descriptor] repeats for each file in the
.ZIP archive.
I am searching for how this encryption header is structured because this specification does not explain it. Does anyone know how this works?

6.1 Traditional PKWARE Decryption
...
6.1.3 Each encrypted file has an extra 12 bytes stored at the start
of the data area defining the encryption header for that file. The
encryption header is originally set to random values, and then
itself encrypted, using three, 32-bit keys. The key values are
initialized using the supplied encryption password. After each byte
is encrypted, the keys are then updated using pseudo-random number
generation techniques in combination with the same CRC-32 algorithm
used in PKZIP and described elsewhere in this document.
...
The specification for the decryption header depends on the encryption algorithm used. There is an Traditional PKWARE Encryption (standard) but this is out of date and therefore a custom encryption/decryption should be used.

RAR 5.0 archive format looks a good document at https://www.rarlab.com/technote.htm
You might find it helpful in the project sharpcompress:
https://github.com/adamhathcock/sharpcompress/tree/master/src/SharpCompress/Common/Zip

Related

How to know the extension when decrypting files that do not know the original extension

Hi I am wondering what to do when decrypting an encrypted file that does not know the original extension while studying aes ctr encryption from this link at https://www.gurutechnologies.net/blog/aes-ctr-encryption-in-c/ To
Thank you in advance.
The "classic" UNIX way (as done by compress, gzip, bzip2, xz, ...) is to keep the original file name and append (not replace) your own extension, so if you encrypt the file foo.txt, you'll output - say - a foo.txt.enc file. When decrypting, just remove the .enc extension.
You can use the Linux file command to see what a file may contain and append the extension manually.
If you're running windows, a quick google search showed that TrID has similar functionality.

How to read, manipulate and write .docx file in c

I am reading .docx file in a buffer and writing it to a new file successfully. (Using fread and fwrite in C) However now I want to enhance the scope of this project for the purpose of encryption. For which I want to be able to manipulate the buffer, then write it in new file.
Now one question might be, what manipulation do I need?
It could be anything really, like I'd write character 's' in buffer's location 15. Like below, and then write this new buffer (having character 's' at location 15, but the rest of the buffer remains unchanged) in a new .docx file.
buffer[15] = 's';
When I did this, the file that was created was corrupt. Since I am not fully aware of the structure of .docx file, this byte number 15 could be some potential identifier, or header, or any important information of .docx file needed for creating a non-corrupt file.
However, the things I know about .docx internal structure are:
It consists of XML files, zipped together.
The content that is written in .docx file, (for e.g. I have a file named test.docx, and it contains "Hello, how are you?") then the contents "Hello, how are you?" are stored in XML files.
There is a .rels (not confirm) extension file, among those files that are zipped together, that tells MS word about where the content is stored in file, i.e. where to look for content.
Apart from these 3 points I don't know much about structure of .docx file. Now considering all this, I want to be able to extract the contents of .docx file, from the XML files zipped together, read it (in C) in a buffer, change the buffer as I need it, and create a new file, with the new content that is present in the buffer.
Can someone guide me through this?
Also kindly mention, if I need to provide code, or any other essential details. Thanks in advance.
EDIT
PURPOSE OF ALL THIS:
I want to do all this for encryption. As by encrypting a file (using AES) the whole file will become unreadable, corrupt and everything inside will be changed from its place. When I decrypt that file, the file is unable to open. My guess is, as AES decryption algo does not know how to parse the contents recovered from decrypting the encrypted file, in to a new .docx file, thus it is unable to place the contents/structure properly in its place.
I have tried it. Original docx file was of 14 KB, encrypted docx file was of 14 KB as well as the decrypted docx file. But when I try to open the decrypted file, it says file is corrupt. Also I tried to check it in HEX editor. Decrypted file has just 00 bytes after exactly 30 Bytes.
DOCX files are based on OPC and OOXML. OPC is based on Zip. OOXML is based on XML. Therefore, you can use Zip and XML tools to operate on DOCX files. Beyond this, you'll have to be more specific about what you wish to do in order to receive better guidance.
Poking characters into random index locations in an XML file is operating at the wrong level of abstraction.

File extension detection mechanism

How Application will detect file extension?
I knew that every file has header that contains all the information related to that file.
My question is how application will use that header to detect that file?
Every file in file system associated some metadata with it for example, if i changed audio file's extension from .mp3 to .txt and then I opened that file with VLC but still VLC is able to play that file.
I found out that every file has header section which contains all the information related to that file.
I want to know how can I access that header?
Just to give you some more details:
A file extension is basically a way to indicate the format of the data (for example, TIFF image files have a format specification).
This way an application can check if the file it handles is of the right format.
Some applications don't check (or accept wrong) file formats and just tries to use them as the format it needs. So for your .mp3 file, the data in this file is not changed when you simply change the extension to .txt.
When VLC reads the .txt byte by byte and interprets it as a .mp3 it can just extract the correct music data from that file.
Now some files include a header for extra validation of what kind of format the data inside the file is. For example a unicode text file (should) include a BOM to indicate how the data in the file needs to be handled. This way an application can check whether the header tag matches the expected header and so it knows for sure that your '.txt` file actually contains data in the 'mp3' format.
Now there are quite some applications to read those header tags, but they are often specific for each format. This TIFF Tag Viewer for example (I used it in the past to check the header tags from my TIFF files).
So or you could just open your file with some kind of hex viewer and then look at the format specifications what every bytes means, or you search Google for a header viewer for the format you want to see them.

Simple library to encapsulate a file in an uncompressed ZIP file?

I want to send a file as an email attachment, but at present there is an email filter that prevents that. Is there a simple method or library to encapsulate a file of any length inside an uncompressed ZIP file? I'd like to avoid adding an actual ZIP library that compresses, if I can. For one thing, the file I'm sending is already compressed.
The zip format has a stored method (method 0) that would allow you to simply enclose the file in the appropriate headers. See the PKWare appnote.txt for a description of the format. You would need to calculate the CRC-32 of the data to include in the headers.

Hexadecimal virus signatures database

Over the past couple of weeks, I was in the process of developing a simple virus scanner. It works great but my question is does anybody know where I can get a database (a single file) that contains 8000 or more virus signatures WITH their names, and possibly risk meter (high, low, unknown)?
Try the ClamAV database. This also includes some more complex signatures, but some are just byte sequences.
The CVD file format is a compressed tar file with a header block attached; see here for header information, or this PDF for the real details.
As I understand it, you should be able to decompress it with
dd if=file.cvd bs=512 skip=1 | tar zxvf -
This will unpack to a collection of various files; for files that have simple hex signatures, these will be found in a file with the extension .db. Not all of these signatures are pure hex -- many of them contain wildcards such as ?? for "allow any byte here", * for "allow any number of intervening bytes here", (-4096) for "allow up to 4k of intervening bytes here", and so forth.

Resources