How to decode deflate block header from the deflate compression output result - zlib

I am trying to decode the header bits based on the output byte of deflate compression output.
char a[50] = "Hello";
z_stream defstream;
defstream.zalloc = Z_NULL;
defstream.zfree = Z_NULL;
defstream.opaque = ZNULL;
defstream.avail_in = (uInt)strlen(a)+1;
defstream.next_in = (Bytef *)a;
defstream.avail_out = (uINt)sizeof(b);
defstream.next_out = (Bytef *)b;
deflateInit(&defstream, Z_BEST_COMPRESSION);
deflate(&defstream, Z_FINISH);
deflateEnd(&defstream);
for (int i=0; i<strlen(b); i++) {
printf("--- byte[%d]=%hhx\n", i, b[i]);
}
The result:
--- byte[0]=78
--- byte[1]=da
--- byte[2]=f3
and so on.
I just want to understand which bits are the 3-bit block header as described in deflate specification. First bit specifies the block final/BFINAL. Next two bits specify the BTYPE.
Based on this result, 0x78 - the first 3 bits are 000 which means BFINAL=0, BTYPE=00/no compression. But this seems not right to me. The BTYPE should specify either 01 or 10.
Am I missing out something here? Can someone please help?
Reference:
deflate specification

You are making a zlib stream, not a raw deflate stream. So the 78 da is the zlib header, not deflate compressed data. The deflate data starts with f3. The low three bits of that are 011. The low 1 is BFINAL (this is the last block), and the 01 is BTYPE (fixed Huffman codes).

Related

When is it valid to call inflateSetDictionary() when trying to decompress raw deflate data that was compressed with a dictionary?

The Problem
When is it valid to call inflateSetDictionary() when trying to decompress raw deflate data that was compressed with a compression dictionary?
According to the zlib manual, it is stated that inflateSetDictionary() can be called "at any time". However, it is unclear to me what "at any time" actually means. If we are allowed to call inflateSetDictionary() "at any time", then I interpret it as being valid to call inflateSetDictionary() after calling inflate(). However, doing so results in inflate() returning an "invalid distance too far back" error.
My Code
I created a simple application to compress the string "hello" using raw deflate, with a compression dictionary that also consists of the byte sequence "hello":
#define BUF_SIZE 16384
#define WINDOW_BITS -15 // Negative for raw.
#define MEM_LEVEL 8
const unsigned char dictionary[] = "hello";
unsigned char uncompressed[BUF_SIZE] = "hello";
unsigned char compressed[BUF_SIZE];
z_stream deflate_stream;
deflate_stream.zalloc = Z_NULL;
deflate_stream.zfree = Z_NULL;
deflate_stream.opaque = Z_NULL;
deflateInit2(&deflate_stream,
Z_DEFAULT_COMPRESSION,
Z_DEFLATED,
WINDOW_BITS,
MEM_LEVEL,
Z_DEFAULT_STRATEGY);
deflateSetDictionary(&deflate_stream, dictionary, sizeof(dictionary));
deflate_stream.avail_in = (uInt)strlen(uncompressed) + 1;
deflate_stream.next_in = (Bytef *)uncompressed;
deflate_stream.avail_out = BUF_SIZE;
deflate_stream.next_out = (Bytef *)compressed;
deflate(&deflate_stream, Z_FINISH);
deflateEnd(&deflate_stream);
This produced 4 bytes of raw deflate data into the compressed buffer:
uLong compressed_size = deflate_stream.total_out;
printf("Compressed size is: %lu\n", compressed_size); // prints Compressed size is: 4
I then attempted to decompress this data back into the string "hello". The zlib manual states that I would need to use raw inflate to decompress raw deflate data:
unsigned char decompressed[BUF_SIZE];
z_stream inflate_stream;
inflate_stream.zalloc = Z_NULL;
inflate_stream.zfree = Z_NULL;
inflate_stream.opaque = Z_NULL;
inflateInit2(&inflate_stream, WINDOW_BITS);
inflate_stream.avail_in = (uInt)compressed_size;
inflate_stream.next_in = (Bytef *)compressed;
inflate_stream.avail_out = BUF_SIZE;
inflate_stream.next_out = (Bytef *)decompressed;
int r = inflate(&inflate_stream, Z_FINISH);
According to the zlib manual, I would expect that inflate() should return Z_NEED_DICT, and I would then call inflateSetDictionary() with a subsequent call to inflate():
// Must be called immediately after a call of inflate, if that call returned Z_NEED_DICT.
if (r == Z_NEED_DICT) {
inflateSetDictionary(&inflate_stream, dictionary, sizeof(dictionary));
r = inflate(&inflate_stream, Z_FINISH);
}
if (r != Z_STREAM_END) {
printf("inflate: %s\n", inflate_stream.msg);
return 1;
}
inflateEnd(&inflate_stream);
printf("Decompressed size is: %lu\n", strlen(decompressed));
printf("Decompressed string is: %s\n", decompressed);
However, what ends up happening is that inflate() will not return Z_NEED_DICT, and instead return Z_DATA_ERROR, with the value of inflate_stream.msg being set to "invalid distance too far back".
Even if I were to adjust my code so that inflateSetDictionary() is called regardless of the return value of inflate(), the subsequent inflate() call will still fail with Z_DATA_ERROR due to "invalid distance too far back".
My Question
So far, my code works correctly if I were to use the default zlib encoding by setting WINDOW_BITS to 15, as opposed to -15 for the raw encoding.
My code also works correctly if I were to move the call for inflateSetDictionary() before the call to inflate().
However, it's not clear to me why my existing code does not allow inflate() to return Z_NEED_DICT, so that I can make a subsequent call to inflateSetDictionary().
Is there a mistake in my code somewhere that is preventing inflate() from returning Z_NEED_DICT? Or can inflateSetDictionary() only be called prior to inflate() for the raw encoding, contrary to what the zlib manual states?
inflate() will only return Z_NEED_DICT for a zlib stream, where the need for a dictionary is indicated by a bit in the zlib header, followed by the Adler-32 of the dictionary that was used for compression to verify or select the dictionary. There is no such indication in a raw deflate stream. There is no way for inflate() to know from a raw deflate stream whether or not the data was compressed with a dictionary. It is up to you to know what is needed for decompression, since you made the raw deflate stream in the first place.
Since you did a deflateSetDictionary() before compressing anything, it is up to you to do an inflateSetDictionary() at the same place, before you decompress, after the inflateInit(). As you have found, you need to insert:
inflateSetDictionary(&inflate_stream, dictionary, sizeof(dictionary));
right after the inflateInit(). Then decompression will be successful.
Yes, you can do inflateSetDictionary() at any point during a raw deflate decompression. However it will only work if you are doing it at the same point at which you did the corresponding deflateSetDictionary() when compressing.

Z_MEM_ERROR Zlib deflateInit2() embedded device

Please mind this code:
#define CHUNK 0x4000
z_stream strm;
unsigned char out[CHUNK];
int ret;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
int windowsBits = 15;
int GZIP_ENCODING = 16;
ret = deflateInit2(&strm, Z_BEST_SPEED, Z_DEFLATED, windowsBits | GZIP_ENCODING, 1,
Z_DEFAULT_STRATEGY);
if(ret == Z_OK) {
strm.next_in = (z_const unsigned char *)answer;
strm.avail_in = strlen(answer);
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = deflate(&strm, Z_FINISH);
} while (strm.avail_out == 0);
}
/* clean up and return */
(void)deflateEnd(&strm);
With answer (unsigned char array of 200 elements with the last one being \0) filled in between the 4 declarations & the rest.
It crashes in the deflateInit2 on Z_MEM_ERROR.
I'm working on a STM32F4 (microcontroller). My RAM was almost full (~87%) before trying to implement the compression.
I got this part working once when I used different parameters but I had an error later in the program(because I want to send the gzip'ed string to an HTTP output, error was:
unrecognized encoding.
I have : ~30 KB of free RAM.
zlib's deflate normally needs about 256K of RAM. See zlib technical details. 30K is a bit restrictive, but you can still get deflate to work using the memLevel and windowBits parameters to reduce the memory footprint. From that page:
deflate memory usage (bytes) = (1 << (windowBits+2)) + (1 << (memLevel+9))
So you can get there with a memLevel of 5, and a windowBits of 11, taking about 24K (plus some other structures). This will reduce the compression effectiveness somewhat, but at least it will work. (You can still add 16 to windowBits for gzip encoding.)

C. Loop compression + send (gzip) ZLIB

I'm currently building an HTTP server in C.
Please mind this piece of code :
#define CHUNK 0x4000
z_stream strm;
unsigned char out[CHUNK];
int ret;
char buff[200];
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
int windowsBits = 15;
int GZIP_ENCODING = 16;
ret = deflateInit2(&strm, Z_BEST_SPEED, Z_DEFLATED, windowsBits | GZIP_ENCODING, 1,
Z_DEFAULT_STRATEGY);
fill(buff); //fill buff with infos
do {
strm.next_in = (z_const unsigned char *)buff;
strm.avail_in = strlen(buff);
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = deflate(&strm, Z_FINISH);
} while (strm.avail_out == 0);
send_to_client(out); //sending a part of the gzip encoded string
fill(buff);
}while(strlen(buff)!=0);
The idea is : I'm trying to send gzip'ed buffers, one by one, that (when they're concatened) is a whole body request.
BUT : for now, my client (a browser) only get the infos of the first buffer. No errors at all though.
How do I achieve this job, how to gzip some buffers inside a loop so I can send them everytime (in the loop) ?
First off, you need to do something with the generated deflate data after each deflate() call. Your code discards the compressed data generated in the inner loop. From this example, after the deflate() you would need something like:
have = CHUNK - strm.avail_out;
if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
(void)deflateEnd(&strm);
return Z_ERRNO;
}
That's where your send_to_client needs to be, sending have bytes.
In your case, your CHUNK is so much larger than your buff, that loop is always executing only once, so you are not discarding any data. However that is only happening because of the Z_FINISH, so when you make the next fix, you will be discarding data with your current code.
Second, you are finishing the deflate() stream each time after no more than 199 bytes of input. This will greatly limit how much compression you can get. Furthermore, you are sending individual gzip streams, for which most browsers will only interpret the first one. (This is actually a bug in those browsers, but I don't imagine they will be fixed.)
You need to give the compressor at least 10's to 100's of Kbytes to work with in order get decent compression. You need to use Z_NO_FLUSH instead of Z_FINISH until you get to your last buff you want to send. Then you use Z_FINISH. Again, take a look at the example and read the comments.

Unable to extract data which is compressed as per IETF RFC 1951: "DEFLATE Compressed Data Format Specification version 1.3"

I want to decompress data which is (or supposed to be as per the specification I'm referring to) in DEFLATE compression format as specified in RFC 1951. Im using zlib library in C.
I referred to this example in github :
https://gist.github.com/gaurav1981/9f8d9bb7542b22f575df
And modified it just to decompress my data:
char dData[MAX_LENGTH];
char cData[MAX_LENGTH];
for(i=0; i < (size-4); i++)
{
cData[i] = *(data + i);
}
//cData[i] = '\0';
printf("Compressed size is: %lu\n", strlen(cData));
z_stream infstream;
infstream.zalloc = Z_NULL;
infstream.zfree = Z_NULL;
infstream.opaque = Z_NULL;
// setup "b" as the input and "c" as the compressed output
//infstream.avail_in = (uInt)((char*)defstream.next_out - b); // size of input
//infstream.avail_in = (uInt)((char*)defstream.next_out - cData);
infstream.avail_in = (uInt)(size - 4);
infstream.next_in = (Bytef *)cData; // input char array
infstream.avail_out = (uInt)sizeof(dData); // size of output
infstream.next_out = (Bytef *)dData; // output char array
// the actual DE-compression work.
inflateInit(&infstream);
inflate(&infstream, Z_NO_FLUSH);
inflateEnd(&infstream);
printf("Uncompressed size is: %lu\n", strlen(dData));
size = strlen(dData);
My uncompressed size is 0. So can someone tell what's wrong with my code?
I even wrote the data into a file and saved it as .gz and .zip but an error came when i tried to extract it (I'm running ubuntu 14.04)
And can someone be kind enough to analyse my data and extract it if it is possible. My data :
6374 492d 2d29 4ece c849 cc4b
294a 4cc9 cc57 f02e cd29 292d 6292 7780
30f2 1293 338a 3293 334a 52f3 98c4 0b9c
4a93 33b2 8b32 4b32 b399 d405 4212 d353
8b4b 320b 0a00
Instead of inflate(), you need to call inflateInit2() with the second argument being -15, in order to decompress raw deflate data.
Your data starts with \0, so strlen will return 0, that you have print as length of uncompressed data

How to Encode and Decode websocket permessage-deflate using zlib library in C?

All
I am trying to encode WebSocket payload using per-message-deflate. I am trying to use some test code but my encoding is not proper can you please help me.
Here is my code:
char a[180] = "{\"type\":\"message\",\"channel\":\"C04U0F9CW\",\"text\":\"TestingMessage\",\"id\":757}";
char b[180];
char c[180];
int i;
// deflate
// zlib struct
z_stream defstream;
defstream.zalloc = Z_NULL;
defstream.zfree = Z_NULL;
defstream.opaque = Z_NULL;
defstream.avail_in = (uInt)strlen(a)+1; // size of input, string + terminator
defstream.next_in = (Bytef *)a; // input char array
defstream.avail_out = (uInt)sizeof(b); // size of output
defstream.next_out = (Bytef *)b; // output char array
//deflateInit(&defstream, Z_SYNC_FLUSH);
deflateInit2(&defstream, Z_DEFAULT_COMPRESSION, Z_DEFLATED, 15, 8, Z_DEFAULT_STRATEGY);
deflate(&defstream, Z_SYNC_FLUSH);
deflateEnd(&defstream);
for(i=0;i<=(char*)defstream.next_out - b;i++)
{
printf("%X",(Bytef)b[i]);
}
printf("\n%s\n",(Bytef *)b);
// This is one way of getting the size of the output
printf("Deflated size is: %lu\n", (char*)defstream.next_out - b);
my output is
kml#kml-VirtualBox:~/zlib-1.2.11/test$ gcc test.c -lz && ./a.out
789CAA562AA92C4855B252CA4D2D2E4E4C4F55D2514ACE48CCCB4BCD18A391B98841AB8593A873454B522B4A804221A9C5259979E9BE70D599294A56E6A6E6B5C0000FFFF2E
x��V*�,HU�R�M-.NLOU�QJ�H��K��9���Y:�EKR+J�B!��%�y��pՙ)JV����
Deflated size is: 72
Inflate:
73
{"type":"message","channel":"C04U0F9CW","text":"TestingMessage","id":757}
So I am expecting payload will me like this
AA562AA92C4855B252CA4D2D2E4E4C4F55D2514ACE48CCCB4BCD018A391B98841AB8593A8703454B522B4A804221A9C5259979E9BE70D599294A56E6A6E6B50000
This is the fiddler WebSocket traffic capture for that particular message
Please help me out to get proper per-message-deflate encoding.
Here is some more upgrade information
GET https://127.0.0.1/websoc.html HTTP/1.1
Host: mpmulti-yklu.slack-msgs.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate
Sec-WebSocket-Key: FriWg0AWhiLhLXXnfJK8kA==
Connection: keep-alive, Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
Response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: S2Q+O/tRM0/mIG//Er2hDdqD0eY=
Sec-Websocket-Extensions: permessage-deflate; server_no_context_takeover; client_no_context_takeover
X-Via: haproxy-ext90
The code is correct but defstream.avail_in expect exact length, not for the extra null character. If you will provide exact input length it will work properly. Try like this
int l = (uInt)strlen(a);
defstream.avail_in = l; // size of input, string + terminator
Please try to understand how z_stream is assigning value check the code then you will better understand about this.
I think many people are trying the decode and they are not getting proper value for per-message-deflate.
Check your output
"789CAA562AA92C4855B252CA4D2D2E4E4C4F55D2514ACE48CCCB4BCD18A391B98841AB8593A873454B522B4A804221A9C5259979E9BE70D599294A56E6A6E6B5C0000FFFF2E"
Here 789C is header and 0000FFFF or according to your code (00FFFF) is the trailer of deflate message. here 0 is for 00 because your output is print one '0' instead of '00'. you are getting 5C extra because of extra one size. Thye '2E' will also will be removed when you will try below code
int d = memcmp(msgend, &b[defstream.total_out - 4], 4) ? 0 : 4;
for(i=0;i<defstream.total_out - d;i++)
{
printf("%2X",(Bytef)b[i]);
}
So your expected answer is
"AA562AA92C4855B252CA4D2D2E4E4C4F55D2514ACE48CCCB4BCD018A391B98841AB8593A8703454B522B4A804221A9C5259979E9BE70D599294A56E6A6E6B50000"
As per your output, 2 bytes are 0 so as per your code it will be '00' and your output follow printing each nibble (half byte).
Please try this is worked for me and I am now able to decode properly.

Resources