zlib get gzip-file-size before compressing - c

I have the following problem:
I'm using zlib in C to gzip files. The compressing (using z_stream, deflateInit2...) is no problem, but I need to know the size of the gziped file before I compress it. Is this possible or is the only option I have to count the bytes while compressing?
Thanks in advance!

but I need to know the size of the gziped file before I compress it
If you mean that you need it to compress (perhaps to allocate a buffer to hold the data), then you are mistaking, the whole point of z_stream is to let you compress input chunks in output chunks.
is the only option I have to count the bytes while compressing
Yes, you need to apply the compression algorithm to know the resulting size.

Related

Zlib decompress bytes with unknown compressed length in C

I am trying to write my own png reader without any external libraries. I need to use Zlib to decompress the png's IDAT chunk. I have managed to do it in python using zlib.decompress(), and I am trying to replicate it in C. I was reading over zlib's docs and found uncompress(), however it requires a destination length which I would not know.
I could set a destination to be much larger than possible for the png, but this seems like a cop-out and would break my program If I had a really big picture. However, i have found a function inflate() which can be used multiple times. If I could do this, i could realloc() memory if needed with each call. Yet I don't understand the docs for it very well and have not found much examples for this type of thing. Could anyone provide some code or help point me in the right direction?
You do know the destination length. Exactly. The PNG header information tells you how many rows, how many columns, and how many bytes per pixel. Multiply it all out, add a byte per row for the filtering, and you have your answer.
Allocate that amount of memory, and decompress into that.
Note that there can be multiple IDAT chunks, but combined they contain a single zlib stream.

How to estimate progress of decompress of bzip2 file using C function?

I could use gzoffset function in zlib to estimate the remaining uncompress file size. Is there a similar function in bzip2 library? If not, is there any trick that I can use?
Just track the amount of compressed data consumed. When you have processed xx% of the compressed data, you have generated approximately xx% of the uncompressed data.
gzoffset() does not tell you anything about the remaining uncompressed file size. It only tells you how many bytes you have uncompressed so far. You can get that simply by counting how many bytes you have uncompressed so far.

Compressing ZIP file using a LZW compresion creates a too large size compressed file

I tried to compress zip file using a LZW compression method(code provided in following link),
http://rosettacode.org/wiki/LZW_compression#C
It creates encoded file length as too long than original file size, what is the reason for that?
please anybody help me to understand the what is happening in real time.
It is impossible for a lossless compression to compress every file to a shorter file.
This is because there are 256N files that are N bytes long, but there are (256N-1)/255 files that are shorter than N bytes. So not every file can be mapped to shorter files.
More than that, if any file becomes shorter, then some shorter file had to give up its spot to make that possible. So some files must become larger.
Lossless compression works by recognizing common patterns in typical files created by humans and converting long high-probability sequences of bytes to shorter sequences. The price for this is that some sequences become longer. The goal of the design is to make typical files compress, but atypical files must get longer.
If a compression does its job, redundant information is removed from a file, and the output is similar to random data. Then the output cannot be compressed further.

How to use RtlDecompressBuffer without knowing the size of the uncompressed data?

I would like to use the WINAPI RtlDecompressBuffer in User Mode to decompress a buffer previously compressed using RtlCompressBuffer. I have the code for compression but it seems that in order to decompress I need to know the size of the uncompressed data as the function needs it as a parameter.
How can I do this without knowing the size of the uncompressed data?
Perhaps I should use RtlDecompressFragment.
A code sample would be great!
Thanks in advance.
You don't need to know the size of the uncompressed data. All you have to do is reserve enough memory to hold all the uncompressed data and pass that to the API.
If your buffer isn't big enough, the API will return STATUS_BAD_COMPRESSION_BUFFER and you then have to allocate a bigger buffer for the uncompressed data.
Why not adding (while compressing) a simple header (first 4 bytes) to the buffer with the uncompressed size?

LZO Decompression Buffer Size

I am using MiniLZO on a project for some really simple compression tasks. I am compressing with one program, and decompressing with another. I'd like to know how much space to allocate for the decompression buffer. I am fine with over-allocating space, if it can save me the trouble of having to annotate my output file with an integer declaring how much space the decompressed data should take. How would I figure out how much space it could possibly take?
After some consideration, I think this question boils down to the following: What is the maximum compression ratio of lzo1x compression?
Since you control both the compressor and the decompressor, I suggest you compress the input in fixed-sized blocks. In my application I compress up to 64KB in each block, then emit the size of the compressed block and the compressed data itself, so the compressed stream actually looks like a series of compressed blocks:
length_of_block_1
block_1
length_of_block_2
block_2
...
The decompressor just reads each compressed block and decompresses it into a 64KB buffer, since I know the block was produced by compressing a 64KB block.
Hope that helps,
Eric Melski
The max size of the decompressed data is clearly the same as the max size of the data you compressed in the first place.
If there is an upper bound on your input size then I guess you can use it, but I have to say the usual way of doing this is to add a header to your compressed buffer which specifies the uncompressed size.

Resources