Is there any maximum limit in file size for Huffman compression? - file

As far as Huffman compression goes, it can compress any files containing ASCII symbols etc. However, is there any maximum limit on the file size?

No, there is no limit. You can string Huffman codes for as long as you like.

Related

zlib get gzip-file-size before compressing

I have the following problem:
I'm using zlib in C to gzip files. The compressing (using z_stream, deflateInit2...) is no problem, but I need to know the size of the gziped file before I compress it. Is this possible or is the only option I have to count the bytes while compressing?
Thanks in advance!
but I need to know the size of the gziped file before I compress it
If you mean that you need it to compress (perhaps to allocate a buffer to hold the data), then you are mistaking, the whole point of z_stream is to let you compress input chunks in output chunks.
is the only option I have to count the bytes while compressing
Yes, you need to apply the compression algorithm to know the resulting size.

Compression mechanism

I know that Huffman encoding is a popular technique for file compression, and I know that it works by encoding more frequent characters with shorter bits. The problem is you can only decode that if you have the tree. Do you actually have to send over the tree as well? If so, in what form? Details please.
Yes, you have to send a representation of the code first. The Huffman code is made canonical, so that you can just send the number of bits in the code corresponding to each symbol. Then the canonical code can be reconstructed from the lengths at the other end. You never have to send the tree.
The lengths can themselves be compressed as well, for another level of efficiency, as well as complexity. See the deflate specification for an example of how Huffman codes are transmitted efficiently.
On how the Huffman tree is transferred exactly depends on the compression format.
Static Huffman encodes the tree. The Deflate algorithm only encodes the number of bits per symbol.
For Adaptive Huffman, there is no need to explicitly encode the tree, as the tree is re-built (or just slightly modified) from time to time. The initial tree is then hardcoded.

Can a stream contain some fix huffman compressed block and some dynamic huffman compressed blocks

Is it possible to compress a stream with some blocks compressed with static Huffman encoding, and some blocks compressed with dynamic Huffman encoding? If yes, is it decompressible ?
Yes, and yes. A stream can contain a mix of all three block types, with the third type being stored.

Compressing ZIP file using a LZW compresion creates a too large size compressed file

I tried to compress zip file using a LZW compression method(code provided in following link),
http://rosettacode.org/wiki/LZW_compression#C
It creates encoded file length as too long than original file size, what is the reason for that?
please anybody help me to understand the what is happening in real time.
It is impossible for a lossless compression to compress every file to a shorter file.
This is because there are 256N files that are N bytes long, but there are (256N-1)/255 files that are shorter than N bytes. So not every file can be mapped to shorter files.
More than that, if any file becomes shorter, then some shorter file had to give up its spot to make that possible. So some files must become larger.
Lossless compression works by recognizing common patterns in typical files created by humans and converting long high-probability sequences of bytes to shorter sequences. The price for this is that some sequences become longer. The goal of the design is to make typical files compress, but atypical files must get longer.
If a compression does its job, redundant information is removed from a file, and the output is similar to random data. Then the output cannot be compressed further.

Array Compression Algorithm

I have an array of 10345 bytes, I want to compress the array and then decompress, kindly suggest me the compression algorithm which can reduce the size of array. I am using c language, and the array is of unsigned char type.
Rephrased: Can someone suggest a general-purpose compression algorithm (or library) for C/C++?
zlib
Lossless Compression Algorithms
This post's a community wiki. I don't want any points for this -- I've already voted to close the question.
The number of bytes to compress has very little to do with choice of compression algorithm, although it does affect the implementation. For example, when you have fewer than 2^15 bytes to compress, if you are using ZLib, you will want to specify a compression-level of less than 15. The compression-level in Zlib (one of the two such parameters) controls the depth of the "look-back" dictionary. If your file is shorter than 16k bytes, then a 32k look-back dictionary will never half-fill; in that case, use one less bit of pointer into the look-back for a 1/15th edge on the compression compared to setting ZLib to "max."
The content of the data is what matters. If you are sending images with mostly background, then you might want Run Length Encoding (used by Windows .BMP, for example).
If you are sending mostly English text, than you wish you could use something like ZLib, which implements Huffman encoding and LZW-style look-back dictionary compression.
If your data has been encrypted, then attempting to compress it will not succeed.
If your data is a particular type of signal, and you can tolerate some loss of detail, then you may want to transform it into frequency space and send only the principal components. (e.g., JPEG, MP3)

Resources