Zip on-the-fly compression library in C for streaming - c

Is there a library for creating zip files (the zip file format not gzip or any other compression format) on-the-fly (so I can start sending the file while it is compressing) for very large files (4 Gb and above).
The compression ratio does not matter much (mostly media files).
The library has to have a c-interface and work on Debian and OSX.

libarchive supports any format you want, on the fly and even in-memory files.

zlib supports compressing by chunks. you should be able to start sending a small chunk right after compressing it, while the library is still compressing the next chunk. (see this example)
(unfortunately, the file table is stored at the end of the zip file, so the file will be unusable until it is complete on the receiver side)

While this question is old and already answered I will note a new potential solution for those that find this.
I needed something very similar, a portable and very small library that created ZIP archives in a streaming fashion in C. Not finding anything that fit the bill I created one that uses zlib, available here:
https://github.com/CTrabant/fdzipstream
That code only depends on zlib and essentially provides a simple interface to creating ZIP archives. Most importantly (for me) the output can be streamed to a pipe, socket, whatever as the output stream does not need to be seek-able. The code is very small, a single source file and a header file. Works on OSX and Linux and probably elsewhere. Hope it helps someone beyond just me...

Related

Is GZIP compression output stable?

I need to store remotely some chunks of data and compare them too see if there are duplications.
I will compile a specific C program and I would like to compress this chuncks with GZIP.
My doubt is: if I compress the same chunk of data with the same C program using a gzip library on different computers, will it give the exact same result or could it give different compressed results?
Target PC/Servers could be with different Linux OSs like Ubuntu/CentOs/Debian, etc.
May I force same result by statically linking a specific gzip library?
if I compress the same chunk of data with the same C program using a gzip library on different computers, will it give the exact same result or could it give different compressed results?
While it may be true in the majority of the cases, I don't think you can safely make this assumption. The compressed output can differ depending on the default compression level and coding used by the library. For example the GNU gzip tool uses LZ77 and OpenBSD gzip uses compress (according to Wikipedia). I don't know if this difference comes from different libraries or different configurations of the same library, but nonetheless I would really avoid assuming that a generic chunk of gzipped data is exactly the same when compressed using different implementations.
May I force same result by statically linking a specific gzip library?
Yes, this could be a solution. Using the same version of the same library with the same configuration across different systems would give you the same compressed output.
You could also avoid this problem in other ways:
Perform the compression on the server, and only send uncompressed data (this is probably not a good solution as sending uncompressed data is slow).
Use hashes of the uncompressed data, store them on the server and check them by making the client send an hash first, and then the compressed data in case the server says the hash doesn't match (i.e. the chunk is not a duplicate). This also has the advantage of only needing to check the hash (and avoiding compression altogether if the hash matches).
Similar to option 2, use hashes of the uncompressed data, but always send compressed data to the server. The server then does decompression (which can be easily done in memory using a relatively small buffer) and hashes the uncompressed data to check if the received chunk is a duplicate before storing it.
No, not unless you are 100% certain you are using exactly the same version of the same source code with the same settings, and that you have disabled the modified timestamp in the gzip header.
It's not clear what you're doing with these compressed chunks, but if the idea is to have less to transmit and compare, then you can do far better with a hash. Use a SHA-256 on your uncompressed chunks, and then you can transmit and compare those in no time. The probability of an accidental match is so infinitesimally small, you'd have to wait for all the stars to go out to see one such occurrence.
May I force same result by statically linking a specific gzip library?
That's not enough, you also need the same compression level at the very least, as well as any other options your particular properties your library might have (usually it's just the level).
If you use the same version of the library and the same compression level, then it's likely that the output is identical (or stable, as you call it). That's not a very strong guarantee however, I'd recommend using a hashing function instead, that's what they're meant for.

decompress multiple files from resource in c

i have a visual c project, where i want to include an archive containing multiple files in a directory structure
i would like to programmatically extract it to somewhere on the disk, using a preferably small library (under small i mean just a few .c and .h files - size doesnt really matter), but i only seem to find libraries that decompress or compress data directly (i looked over lzo-lzop-minilzo, but i dont seem to find anything that says it can decmpress an entire directory tree even tho i used lzop already to compress the archive with the files)
thanks
zlib and accompanying (in the contrib) minizip library support .zip format decompression.
You can pack the .zip file as a resource in your executable. To get the raw data from .exe use the FindResource, SizeofResource, LoadResource, LockResource APIs.
Then see minizip's samples to see how to decompress and read zlib's documentation to overload the I/O callbacks.
Disclaimer: I did this for Linderdaum Engine's virtual file system and there is now support for .zip, .tar, .rar (uncompressed) and .tar.gz formats. The code for VFS is in Src/Linderdaum/Core/VFS and it is under MIT license for non-commercial use. It's C++, but the I/O wrappers for zlib use C-style API and the code is pretty straightforward.

Programmatically extract files from dd image in C

I have a few dd images and I wanted to know the best way of extracting files out of these using C. The images are of mixed file systems (fat32, ntfs, ext2/3) and the host machine doing the extraction would be an Ubuntu box, so you can assume kernel headers and GNU C library, etc.
Natively would be best, but external libraries that do the job would also be fine. A link to an example would be perfect.
This is a significant effort. You'd have to essentially reimplement filesystem drivers for NTFS, FAT, and EXT2. I've done this for FAT and NTFS, but it took more than two years, though much of that was reverse engineering NTFS.
Consider using the file mount option of the mount command so you can use the Ubuntu filesystem drivers and not reinvent the significantly large wheel. Then you can peruse the mounted filesystems.
Why programatically with C?
sudo mount -o loop,offset=[offset] -t auto [where] [what]
Where
offset is the offset in bytes from the beginning of the disk, in bytes
where is where on the current filesystem to mount the image
what is the disk image you're mounting
Look at The Sleuth Kit, which should work with all of the file system types you listed:
The original part of Sleuth Kit is a C library and collection of command line file and volume system forensic analysis tools. The file system tools allow you to examine file systems of a suspect computer in a non-intrusive fashion. Because the tools do not rely on the operating system to process the file systems, deleted and hidden content is shown. It runs on Windows and Unix platforms.
The Sleuth Kit's existing toolset is a great place to start if you're looking for sample code to work from.
Check out:
ext2fuse or fuse-ext2 projects, they contain some ext2/ext3 implementations on user space using FUSE.
ntfs-3g, an NTFS implementation also using FUSE.
FUSE itself, as there are lots of filesystem implementations on top of it.

C library to read from zip archives

Is there a portable C library to access .zip archives? "gzip" or "zlib" (the closest I could find) only handle compressed data, I need to be able to list the files inside the archive, and access each one individually, and if they're compressed using the 'deflate' method, I can use zlib on it.
Minizip, maybe?
http://www.winimage.com/zLibDll/minizip.html
The zip that comes with Linux and BSD is actually called info-ZIP which is here. Personally I have not tried such a thing but the info-zip front page states "Info-ZIP's primary compression engine has also been spun off into the free zlib compression library", so you might want to check out zlib. The zlib page has a FAQ with a answer to your specific question. I would start by studying how info-zip works. Good luck.
7-zip has a complete SDK, with example sources, and a lot of functionality.
take a look here

Decompress PNG using zlib

How can I use zlib library to decompress a PNG file? I need to read a PNG file using a C under gcc compiler.
Why not use libpng? The PNG file format is fairly simple, but there are many different possible variations and encoding methods and it can be fairly tedious to ensure you cover all of the cases. Something like libpng handles all the conversion and stuff for you automatically.
I've code once a basic Java library for reading/writing PNG files: http://code.google.com/p/pngj/
It does not support palleted images but apart from that[Updated: it supports all PNG variants now] it's fairly complete, simple and the code has no external dependencies (i.e. it only uses the standard JSE API, which includes zip decompression). And the code is available. I guess you could port it to C with not much effort.
If this is a homework assignment and you really are only restricted to the standard C library, you to be looking at the official PNG file format specification: http://www.w3.org/TR/PNG/. However, are you sure you really need to be decoding the PNG file? If all you need to do is display it somehow, you're headed on the wrong path.
It will be rather complex and time consuming to write a decoder for any general PNG file, but not too bad for simple ones. In fact, because the PNG format allows for pieces of it to be compressed, to do it with only standard C libraries would require you to implement gzip decompress (a reasonable homework assignment for a mid-level undergrad course, but my guess is that you would have spent a lot of discussing compression algoirthms before this was assigned to you)
However, it isn't terribly difficult if you restrict yourself to non-compressed, non-interlaced PNG files. I wrote a decoder once in Python that handled only the easy cases in a couple of hours, so I'm sure it'll be doable in C.
You should probably read up on how a binary file-format works and use a hex-editor instead of a text-editor to look at the files. Generally you should use libpng to handle png-files as stated earlier but if you want to decode it yourself you have alot of reading to do.
I recommend reading this http://www.libpng.org/pub/png/book/chapter13.html

Resources