How to estimate progress of decompress of bzip2 file using C function? - zlib

I could use gzoffset function in zlib to estimate the remaining uncompress file size. Is there a similar function in bzip2 library? If not, is there any trick that I can use?

Just track the amount of compressed data consumed. When you have processed xx% of the compressed data, you have generated approximately xx% of the uncompressed data.
gzoffset() does not tell you anything about the remaining uncompressed file size. It only tells you how many bytes you have uncompressed so far. You can get that simply by counting how many bytes you have uncompressed so far.

Related

zlib get gzip-file-size before compressing

I have the following problem:
I'm using zlib in C to gzip files. The compressing (using z_stream, deflateInit2...) is no problem, but I need to know the size of the gziped file before I compress it. Is this possible or is the only option I have to count the bytes while compressing?
Thanks in advance!
but I need to know the size of the gziped file before I compress it
If you mean that you need it to compress (perhaps to allocate a buffer to hold the data), then you are mistaking, the whole point of z_stream is to let you compress input chunks in output chunks.
is the only option I have to count the bytes while compressing
Yes, you need to apply the compression algorithm to know the resulting size.

Zlib decompress bytes with unknown compressed length in C

I am trying to write my own png reader without any external libraries. I need to use Zlib to decompress the png's IDAT chunk. I have managed to do it in python using zlib.decompress(), and I am trying to replicate it in C. I was reading over zlib's docs and found uncompress(), however it requires a destination length which I would not know.
I could set a destination to be much larger than possible for the png, but this seems like a cop-out and would break my program If I had a really big picture. However, i have found a function inflate() which can be used multiple times. If I could do this, i could realloc() memory if needed with each call. Yet I don't understand the docs for it very well and have not found much examples for this type of thing. Could anyone provide some code or help point me in the right direction?
You do know the destination length. Exactly. The PNG header information tells you how many rows, how many columns, and how many bytes per pixel. Multiply it all out, add a byte per row for the filtering, and you have your answer.
Allocate that amount of memory, and decompress into that.
Note that there can be multiple IDAT chunks, but combined they contain a single zlib stream.

find and replace data on gzip content efficiently

my c linux based program inputs are:
char *in_str, char *find_str, char *replacing_str
the in_str is a compressed data (gzip).
the program needs to find for the find_str within the uncompressed input data, replace it with replacing_str, and then to recompress the data.
the trivial way to do so is by using one of the many available gzip compress/uncompress libraries to uncompress the data, manipulate the uncompressed data, and then to recompress the output. However i need to make it as efficient as possible (it is a RT program).
i wonder if it is more efficient to use an on-the-fly library (e.g. zlibc) approach or simply do the operation as described above.
maybe it is important to mention that:
the find_str and replacing_str strings are a small portion of the data
their lengths are not equal
the find_str supposed to appear about 4 or 5 times
the uncompressed data len is ~2K - 6K bytes
does anyone familiar with an efficient way to implement this?
Thanks
You are going to have to decompress no matter what, in order to search for the strings. (You might be able to get away with doing that only once and building an index. However that might be much larger than the uncompressed data, so you might as well just store it uncompressed instead.)
You can avoid recompressing all of it by preparing the gzip file ahead of time to be compressed in smaller historyless units using, for example, the Z_FULL_FLUSH option of zlib. This will reduce compression slightly depending on how often you do it, but will speed up building the output greatly if only one of many blocks need to be recompressed.

How to use RtlDecompressBuffer without knowing the size of the uncompressed data?

I would like to use the WINAPI RtlDecompressBuffer in User Mode to decompress a buffer previously compressed using RtlCompressBuffer. I have the code for compression but it seems that in order to decompress I need to know the size of the uncompressed data as the function needs it as a parameter.
How can I do this without knowing the size of the uncompressed data?
Perhaps I should use RtlDecompressFragment.
A code sample would be great!
Thanks in advance.
You don't need to know the size of the uncompressed data. All you have to do is reserve enough memory to hold all the uncompressed data and pass that to the API.
If your buffer isn't big enough, the API will return STATUS_BAD_COMPRESSION_BUFFER and you then have to allocate a bigger buffer for the uncompressed data.
Why not adding (while compressing) a simple header (first 4 bytes) to the buffer with the uncompressed size?

LZO Decompression Buffer Size

I am using MiniLZO on a project for some really simple compression tasks. I am compressing with one program, and decompressing with another. I'd like to know how much space to allocate for the decompression buffer. I am fine with over-allocating space, if it can save me the trouble of having to annotate my output file with an integer declaring how much space the decompressed data should take. How would I figure out how much space it could possibly take?
After some consideration, I think this question boils down to the following: What is the maximum compression ratio of lzo1x compression?
Since you control both the compressor and the decompressor, I suggest you compress the input in fixed-sized blocks. In my application I compress up to 64KB in each block, then emit the size of the compressed block and the compressed data itself, so the compressed stream actually looks like a series of compressed blocks:
length_of_block_1
block_1
length_of_block_2
block_2
...
The decompressor just reads each compressed block and decompresses it into a 64KB buffer, since I know the block was produced by compressing a 64KB block.
Hope that helps,
Eric Melski
The max size of the decompressed data is clearly the same as the max size of the data you compressed in the first place.
If there is an upper bound on your input size then I guess you can use it, but I have to say the usual way of doing this is to add a header to your compressed buffer which specifies the uncompressed size.

Resources