GZIP compression using zlib into buffer - c

I want to compress a memory buffer using gzip and put the compressed bytes into another memory buffer. I want to send the compressed buffer in the payload of a HTTP packet with Content-Encoding: gzip. I can easily do this using zlib for deflate compression ( compress() function ). However, there is no API that I see for what I need ( gzip ). The zlib API is to compress and write to a file ( gzwrite() ). However, I want to compress and write to a buffer.
Any ideas?
I am in C on Linux.

deflate() works in zlib format by default, to enable gzip compressing you need to use deflateInit2() to "Add 16" to windowBits as in the code below, windowBits is the key to switch to gzip format
// hope this would help
int compressToGzip(const char* input, int inputSize, char* output, int outputSize)
{
z_stream zs;
zs.zalloc = Z_NULL;
zs.zfree = Z_NULL;
zs.opaque = Z_NULL;
zs.avail_in = (uInt)inputSize;
zs.next_in = (Bytef *)input;
zs.avail_out = (uInt)outputSize;
zs.next_out = (Bytef *)output;
// hard to believe they don't have a macro for gzip encoding, "Add 16" is the best thing zlib can do:
// "Add 16 to windowBits to write a simple gzip header and trailer around the compressed data instead of a zlib wrapper"
deflateInit2(&zs, Z_DEFAULT_COMPRESSION, Z_DEFLATED, 15 | 16, 8, Z_DEFAULT_STRATEGY);
deflate(&zs, Z_FINISH);
deflateEnd(&zs);
return zs.total_out;
}
Some relevant contents from their header:
"This library can optionally read and write gzip and raw deflate streams in
memory as well."
"Add 16 to windowBits to write a simple gzip header and trailer around the
compressed data instead of a zlib wrapper"
It's funny document of deflateInit2() is 1000+ lines away from its definition, I wouldn't ready the document again unless I have to.

No, the zlib API does in fact provide gzip compression in memory with the deflate functions. You need to actually read the documentation in zlib.h.

Gzip is a file format that's why it seems the utility functions provided operates on a fd, use shm_open() to create an fd mmap() with sufficient memory. It is important that the data being written doesn't extend the size of the mapped region otherwise the write will fail. That's a limitation with mmapped region.
Pass the fd to gzdopen().
But as Mark suggested in his answer using Basic API interface is a better way.

Related

External file ressource on embedded system (C language with FAT)

My application/device is running on an ARM Cortex M3 (STM32), without OS but with a FatFs) and needs to access many resources files (audio, image, etc..)
The code runs from internal flash (ROM, 256Kb).
The resources files are stored on external flash (SD card, 4Gb).
There is not much RAM (32Kb), so malloc a complete file from package is not an option.
As the user has access to the resources folder for atomic update, I would like to package all theses resources files in a single (.dat, .rom, .whatever)
So the user doesn't mishandle theses data.
Can someone point me to a nice solution to do so?
I don't mind remapping fopen, fread, fseek and fclose in my application, but I would not like starting from scratch (coding the serializer, table of content, parser, etc...). My system is quite limited (no malloc, no framework, just stdlib and FatFs)
Thanks for any input you can give me.
note: I'm not looking for a solution where the resources are embedded IN the code (ROM) as obviously they are way too big for that.
It should be possible to use fatfs recursively.
Drive 0 would be your real device, and drive 1 would be a file on drive 0. You can implement the disk_* functions like this
#define BLOCKSIZE 512
FIL imagefile;
DSTATUS disk_initialize(BYTE drv) {
UINT r;
if(drv == 0)
return SD_initialize();
else if(drv == 1) {
r = f_open(&image, "0:/RESOURCE.DAT", FA_READ);
if(r == FR_OK)
return 0;
}
return STA_NOINIT;
}
DRESULT disk_read(BYTE drv, BYTE *buff, DWORD sector, DWORD count) {
UINT br, r;
if(drv == 0)
return SD_read_blocks(buff, sector, count);
else if(drv == 1) {
r = f_seek(&imagefile, sector*BLOCKSIZE);
if(r != FR_OK)
return RES_ERROR;
r = f_read(&imagefile, buff, count*BLOCKSIZE, &br);
if((r == FR_OK) && (br == count*BLOCKSIZE))
return RES_OK;
}
return RES_ERROR;
}
To create the filesystem image on Linux or other similar systems you'd need mkfs.msdos and the mtools package. See this SO post on how to do it. Might work on Windows with Cygwin, too.
To expand on what Joachim said above:
Popular choices of uncompressed (sometimes) archive formats are cpio, tar, and zip. Any of the 3 would work just fine.
Here are a few more in-depth comments on using TAR or CPIO.
TAR
I've used tar before for the exact purpose, on an stm32 with FatFS, so can tell you it works. I chose it over cpio or zip because of its familiarity (most developers have seen it), ease of use, and rich command line tools.
GNU Tar gives you fine-grained control over order in which the files are placed in the archive and regexes to manipulate file names (--xform) or --exclude paths. You can pretty much guarantee you can get exactly the archive you're after with nothing more than GNU Tar and a makefile. I'm not sure the same can be said for cpio or zip.
This means it worked well for my build environment, but your requirements may vary.
CPIO
The cpio has a much worse/harder to use set of command line tools than tar in my opinion. Which is why I steer clear of it when I can. However, its file format is a little lighter-weight and might be even simpler to parse (not that tar is hard).
The Linux kernel project uses cpio for initramfs images, so that's probably the best / most mature example on the internet that you'll find on using it for this sort of purpose.
If you grab any kernel source tree, the tool usr/gen_init_cpio.c can used to generate a cpio from a cpio listing file format described in that source file.
The extraction code is in init/initramfs.c.
ZIP
I've never used the zip format for this sort of purpose. So no real comment there.
Berendi found a very clever solution: use the existing fat library to access it recursively!
The implementation is quite simple, and after extensive testing, I'd like to post the code to use FatFs recursively and the commands used for single file fat generation.
First, lets generate a 100Mb FAT32 file:
dd if=/dev/zero of=fat.fs bs=1024 count=102400
mkfs.vfat -F 32 -r 112 -S 512 -v fatfile.fs
Create/push content into it:
echo HelloWorld on Virtual FAT >> helloworld.txt
mcopy -i fatfile.fs helloworld.txt ::/
Change the diskio.c file, to add Berendi's code but also:
DSTATUS disk_status ()
{
DSTATUS status = STA_NOINIT;
switch (pdrv)
{
case FATFS_DRIVE_VIRTUAL:
printf("disk_status: FATFS_DRIVE_VIRTUAL\r\n" );
case FATFS_DRIVE_ATA: /* SD CARD */
status = FATFS_SD_SDIO_disk_status();
}
}
Dont forget to add the enum for the drive name, and the number of volumes:
#define _VOLUMES 2
Then mount the virtual FAT, and access it:
f_mount(&VirtualFAT, (TCHAR const*)"1:/", 1);
f_open(&file, "1:/test.txt", FA_READ);
Thanks a lot for your help.

Remove HTTP Header Info

In C is there a way to exclude the HTTP header information that comes with the data when using recv() on a socket? I am trying to read some binary data and all I want is the actual binary information, not the HTTP header information. The current data received looks like this:
HTTP/1.1 200 OK
Content-Length: 3314
Content-Type: image/jpeg
Last-Modified: Tue, 20 Mar 2012 14:51:34 GMT
Accept-Ranges: bytes
ETag: "45da99f1a86cd1:6b9"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Mon, 20 Aug 2012 14:10:08 GMT
Connection: close
╪ α
I would like only to read the binary portion of the file. (That's obviously not all the binary, only that much was displayed since I printed the output from my recv loop as a string and the first NULL char is after that small binary string).
I just need to get rid of the header portion, is there a simple way to do this?
You would be better of using some HTTP parsing library like curl
If you want to do it yourself:
You can search for '\r\n\r\n' (two \r\n) which separates HTTP headers and contents, and use string/buffer after that.
Also, you need to get Content-Length from header and read that many bytes as http content.
Something like:
/* http_resp has data read from recv */
httpbody = strstr(http_resp, "\r\n\r\n");
if(httpbody)
httpbody += 4; /* move ahead 4 chars
/* now httpbody has just data, stripped down http headers */
Note: make sure strstr does not overrun the memory, may be using strnstr (not sure this exists or not) or similar functions.
I think you need to extract the value of the Content-Length to know the size of the binary data to be read otherwise it will be impossible to know whether all data has been received. A simple approach to consume, and mostly ignore, the header portion is to read the incoming data byte-by-byte until "\r\n\r\n" is encountered, which indicates the end of the header section and the beginning of the content.

Unzip a zip file using zlib

I have an archive.zip which contains two crypted ".txt" files. I would like to decompress the archive in order to retrieve those 2 files.
Here's what I've done so far:
FILE *FileIn = fopen("./archive.zip", "rb");
if (FileIn)
printf("file opened\n");
else
printf("unable to open file\n");
fseek(FileIn, 0, SEEK_END);
unsigned long FileInSize = ftell(FileIn);
printf("size of input compressed file : %u\n", FileInSize);
void *CompDataBuff = malloc(FileInSize);
void *UnCompDataBuff = NULL;
int fd = open ("archive.zip", O_RDONLY);
CompDataBuff = mmap(NULL, FileInSize, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
printf("buffer read : %s\n", (char *)CompDataBuff);
uLongf UnCompSize = (FileInSize * 11/10 + 12);
UnCompDataBuff = malloc(UnCompSize);
int ret_uncp ;
ret_uncp = uncompress((Bytef*)UnCompDataBuff, &UnCompSize, (const Bytef*)CompDataBuff,FileInSize);
printf("size of uncompressed data : %u\n", UnCompSize);
if (ret_uncp == Z_OK){
printf("uncompression ok\n");
printf("uncompressed data : %s\n",(char *)UnCompDataBuff);
}
if (ret_uncp == Z_MEM_ERROR)
printf("uncompression memory error\n");
if (ret_uncp == Z_BUF_ERROR)
printf("uncompression buffer error\n");
if (ret_uncp == Z_DATA_ERROR)
printf("uncompression data error\n");
I always get "uncompression data error" and I don't know why. And then I would like to know how to retrieve the 2 files with my data uncompressed.
zip is a file format that wraps header and trailer information around compressed data streams in order to represent a set of files and directories. The compressed data streams are almost always deflate data streams, which can in fact be generated and decoded by zlib. zlib also provides the crc32 function which can be used to generate and check the crc values in the zip wrapper information.
What zlib does not do by itself is decode and deconstruct the zip structure. You can either write your own code to do that using the specification (not very hard to do), or you can use the minizip routines in the contrib/minizip directory of the zlib distribution, which provides functions to open, access, and close zip files.
Zlib is not a library for handling .zip files. It supports decompressing zlib and gzip streams, both of which work on the level of a single stream of data, rather than an "archive" format like .zip.
You would need a different library (for one example, libzip; there are many others) to open and manipulate .zip archives.
As mentioned, zlib only handles compression, it doesn't archive. When you want to zip or unzip what you are doing is extracting files from an archive which happens to be in a zip format (there are other formats like rar, 7zip and so on)
If you want to create zips or unzip files you have to handle the zip format and minizip is a nice library, robust and has been there for quite a long time.
There is a contrib for minizip https://github.com/nmoinvaz/minizip with examples on how to use it. Is not that hard, and you can check the minizip.c and miniunz.c for code on how to use it. (Minizip uses zlib for the compression)
Also i ended up building a library that wraps minizip and adds a bunch of nice features to it and makes it easier to use and more object oriented. Lets you do things like zip entire folders, streams, vectors, etc. As well as doing everything entirely in memory.
Repo with examples here: https://github.com/sebastiandev/zipper
Beta pre-release: https://github.com/sebastiandev/zipper/releases/
Code looks something like:
Zipper zipper("ziptest.zip");
zipper.add("somefile.txt");
zipper.add("myFolder");
zipper.close();
if you are using C++ try this example
its call the unzip source
https://github.com/fatalfeel/proton_sdk_source/blob/master/shared/FileSystem/FileSystemZip.cpp
https://github.com/fatalfeel/proton_sdk_source/tree/master/shared/util/unzip

Creating a pcap file

I need to save UDP packets to a file and would like to use the pcap format to reuse the various tools available (wireshark, tcpdump, ...).
There are some information in this thread but I can't find how to write the global file header 'struct pcap_file_header'.
pcap_t* pd = pcap_open_dead(DLT_RAW, 65535);
pcap_dumper_t* pdumper = pcap_dump_open(pd, filename);
struct pcap_file_header file_hdr;
file_hdr.magic_number = 0xa1b2c3d4;
file_hdr.version_major = 2;
file_hdr.version_minor = 4;
file_hdr.thiszone = 0;
file_hdr.sigfigs = 0;
file_hdr.snaplen = 65535;
file_hdr.linktype = 1;
// How do I write file_hdr to m_pdumper?
while( (len = recvmsg(sd, &msg_hdr, 0)) > 0 )
pcap_dump((u_char*)m_pdumper, &m_pcap_pkthdr, (const u_char*)&data);
How should I write the global file header?
If there is no specific pcap function available, how can I retrieve the file descriptor to insert the header using write()?
You shouldn't need to write that header, pcap_open_dead should do it for you. You only need to fill out and write that header yourself if you want to write the file directly instead of using pcap_dump and friends. There's an example here of a trivial program write out a pcap file with those functions.
original answer, concerning writing the file directly:
I can't remember exactly how this works, but I wrote a patch to redir a while ago that would write out pcap files, you may be able to use it as an example.
You can find it attached to this debian bug. (bug link fixed.)
Some of it is for faking the ethernet and IP headers, and may not be applicable as you're using pcap_dump_open and pcap_dump where as the patch linked above writes out the pcap file without using any libraries, but I'll leave this here anyway in case it helps.
If you are interested in UDP and TCP only, you should use DLT_EN10MB instead of DLT_RAW ( cf pcap_open_dead to simulate full UDP packets capture ).
It is much better when editing in WireShak.

how to reassemble tcp and decode http info in c code?

I am working with libpcap to check http info. libpcap can not reassemble tcp segment.
there are many corner cases to deal manually. I also read wireshark source code. It's too big.
Does have any open source code can reassemble tcp and disect http data in c?
Have hacked code of driftnet, tcpflow, pcap etc. earlier.
tcpflow can re-assemble dumps from ie tcpdump. A "typical" chain of work could be:
$ tcpdump -nnieth0 -w dump.raw
# dump dum dump
$ mkdir tmp && cd tmp
tmp/$ tcpflow -r ../dump.raw
# This joins the transfers into separate files
# Now one can investigate each transfere from those separate files
# Next join them to one:
tmp/$ cat * > ../dump.flow
tmp/$ cd ..
# Extract some data
$ foremost -i dump.flow
Believe you can find some useful lines in the source-code for these.
Else:
A HTTP parsing library: HTTP Parser
Easiest way to do this is download wireshark software open pcap file or live capture it. after that right click any packet and go to "follow tcp stream"...you will see your http data in opened window.
If you want to build a program from scratch.-
A pcap file structure for tcp transaction is something like this:
[pcap_file_header]
for each packet
[pcap_packet] --this contains packet len info
[ip_header]----usually of size 20 or more
[tcp_header]--usually of size 20 or more
[packet] ---len stored in pcap packet header
Now to read the info, first get your pcap file in stream pointer.
Read packet file header(google for various type of struct)
start a loop
for each packet
read pcap_phdr from file or stream
add offset of ip_hdr length and tcp hdr length
for example pointer=pointer+20(for ip)+20(for tcp)
the pointer should be pointing to your data
so just give read pcap_phdr.caplen amount of byte and print it character by character.
The smallest TCP/IP stack I know of, which is open source is uIP but it is a bit odd, as it is designed for extremely small systems (microcontrollers)
Another small Open Source TCP/IP stack, which is a bit more traditional is lwIP

Resources