Unzip a zip file using zlib - c

I have an archive.zip which contains two crypted ".txt" files. I would like to decompress the archive in order to retrieve those 2 files.
Here's what I've done so far:
FILE *FileIn = fopen("./archive.zip", "rb");
if (FileIn)
printf("file opened\n");
else
printf("unable to open file\n");
fseek(FileIn, 0, SEEK_END);
unsigned long FileInSize = ftell(FileIn);
printf("size of input compressed file : %u\n", FileInSize);
void *CompDataBuff = malloc(FileInSize);
void *UnCompDataBuff = NULL;
int fd = open ("archive.zip", O_RDONLY);
CompDataBuff = mmap(NULL, FileInSize, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
printf("buffer read : %s\n", (char *)CompDataBuff);
uLongf UnCompSize = (FileInSize * 11/10 + 12);
UnCompDataBuff = malloc(UnCompSize);
int ret_uncp ;
ret_uncp = uncompress((Bytef*)UnCompDataBuff, &UnCompSize, (const Bytef*)CompDataBuff,FileInSize);
printf("size of uncompressed data : %u\n", UnCompSize);
if (ret_uncp == Z_OK){
printf("uncompression ok\n");
printf("uncompressed data : %s\n",(char *)UnCompDataBuff);
}
if (ret_uncp == Z_MEM_ERROR)
printf("uncompression memory error\n");
if (ret_uncp == Z_BUF_ERROR)
printf("uncompression buffer error\n");
if (ret_uncp == Z_DATA_ERROR)
printf("uncompression data error\n");
I always get "uncompression data error" and I don't know why. And then I would like to know how to retrieve the 2 files with my data uncompressed.

zip is a file format that wraps header and trailer information around compressed data streams in order to represent a set of files and directories. The compressed data streams are almost always deflate data streams, which can in fact be generated and decoded by zlib. zlib also provides the crc32 function which can be used to generate and check the crc values in the zip wrapper information.
What zlib does not do by itself is decode and deconstruct the zip structure. You can either write your own code to do that using the specification (not very hard to do), or you can use the minizip routines in the contrib/minizip directory of the zlib distribution, which provides functions to open, access, and close zip files.

Zlib is not a library for handling .zip files. It supports decompressing zlib and gzip streams, both of which work on the level of a single stream of data, rather than an "archive" format like .zip.
You would need a different library (for one example, libzip; there are many others) to open and manipulate .zip archives.

As mentioned, zlib only handles compression, it doesn't archive. When you want to zip or unzip what you are doing is extracting files from an archive which happens to be in a zip format (there are other formats like rar, 7zip and so on)
If you want to create zips or unzip files you have to handle the zip format and minizip is a nice library, robust and has been there for quite a long time.
There is a contrib for minizip https://github.com/nmoinvaz/minizip with examples on how to use it. Is not that hard, and you can check the minizip.c and miniunz.c for code on how to use it. (Minizip uses zlib for the compression)
Also i ended up building a library that wraps minizip and adds a bunch of nice features to it and makes it easier to use and more object oriented. Lets you do things like zip entire folders, streams, vectors, etc. As well as doing everything entirely in memory.
Repo with examples here: https://github.com/sebastiandev/zipper
Beta pre-release: https://github.com/sebastiandev/zipper/releases/
Code looks something like:
Zipper zipper("ziptest.zip");
zipper.add("somefile.txt");
zipper.add("myFolder");
zipper.close();

if you are using C++ try this example
its call the unzip source
https://github.com/fatalfeel/proton_sdk_source/blob/master/shared/FileSystem/FileSystemZip.cpp
https://github.com/fatalfeel/proton_sdk_source/tree/master/shared/util/unzip

Related

Detecting file MIME in C

I have files with wrong extensions, and try to find the correct MIME in a C script.
For a PDF file with txt extension, magic (#include <magic.h>)
const char *mime;
magic_t magic;
magic = magic_open(MAGIC_MIME_TYPE);
magic_load(magic, NULL);
magic_compile(magic, NULL);
mime = magic_file(magic, filename);
printf("%s\n", mime);
magic_close(magic);
returned
application/octet-stream
which is not very helpful.
GIO 2.0 (#include <gio/gio.h>)
char *content_type = g_content_type_guess (file_name, NULL, 0, &is_certain);
if (content_type != NULL)
{
char *mime_type = g_content_type_get_mime_type (content_type);
g_print ("Content type for file '%s': %s (certain: %s)\n"
"MIME type for content type: %s\n",
file_name,
content_type,
is_certain ? "yes" : "no",
mime_type);
g_free (mime_type);
}
returned
Content type for file 'test.txt': text/plain (certain: no)
MIME type for content type: text/plain
However, file command in Linux returns the correct MIME:
file test.txt
test.txt: PDF document, version 1.6
This should not be the expected behaviors of these well-established libraries in C. What do I do wrong?
It is true, that file utility is base on top of libmagic, but what really determines returned values is flags provided to libmagic_open (or appropriate set functions) and used database of MIME types.
Library provides means to use pre-compiled database and raw database (has to be compiled by calling libmagic_compile), which is your case. Documentation defines default dabase files when called using NULL parameter as a /usr/local/share/misc/magic for raw database (on debian directory link from /usr/share/misc/magic to ../file/magic/, and is empty) and magic.mgs in same parent directory.
Compiled library is by default placed into working directory and on my debian system seams to be empty (confirmed by default directory of database data being empty). After realizing this, I tried your example with magic_compile removed and it seams to improve things significantly.

GZIP compression using zlib into buffer

I want to compress a memory buffer using gzip and put the compressed bytes into another memory buffer. I want to send the compressed buffer in the payload of a HTTP packet with Content-Encoding: gzip. I can easily do this using zlib for deflate compression ( compress() function ). However, there is no API that I see for what I need ( gzip ). The zlib API is to compress and write to a file ( gzwrite() ). However, I want to compress and write to a buffer.
Any ideas?
I am in C on Linux.
deflate() works in zlib format by default, to enable gzip compressing you need to use deflateInit2() to "Add 16" to windowBits as in the code below, windowBits is the key to switch to gzip format
// hope this would help
int compressToGzip(const char* input, int inputSize, char* output, int outputSize)
{
z_stream zs;
zs.zalloc = Z_NULL;
zs.zfree = Z_NULL;
zs.opaque = Z_NULL;
zs.avail_in = (uInt)inputSize;
zs.next_in = (Bytef *)input;
zs.avail_out = (uInt)outputSize;
zs.next_out = (Bytef *)output;
// hard to believe they don't have a macro for gzip encoding, "Add 16" is the best thing zlib can do:
// "Add 16 to windowBits to write a simple gzip header and trailer around the compressed data instead of a zlib wrapper"
deflateInit2(&zs, Z_DEFAULT_COMPRESSION, Z_DEFLATED, 15 | 16, 8, Z_DEFAULT_STRATEGY);
deflate(&zs, Z_FINISH);
deflateEnd(&zs);
return zs.total_out;
}
Some relevant contents from their header:
"This library can optionally read and write gzip and raw deflate streams in
memory as well."
"Add 16 to windowBits to write a simple gzip header and trailer around the
compressed data instead of a zlib wrapper"
It's funny document of deflateInit2() is 1000+ lines away from its definition, I wouldn't ready the document again unless I have to.
No, the zlib API does in fact provide gzip compression in memory with the deflate functions. You need to actually read the documentation in zlib.h.
Gzip is a file format that's why it seems the utility functions provided operates on a fd, use shm_open() to create an fd mmap() with sufficient memory. It is important that the data being written doesn't extend the size of the mapped region otherwise the write will fail. That's a limitation with mmapped region.
Pass the fd to gzdopen().
But as Mark suggested in his answer using Basic API interface is a better way.

External file ressource on embedded system (C language with FAT)

My application/device is running on an ARM Cortex M3 (STM32), without OS but with a FatFs) and needs to access many resources files (audio, image, etc..)
The code runs from internal flash (ROM, 256Kb).
The resources files are stored on external flash (SD card, 4Gb).
There is not much RAM (32Kb), so malloc a complete file from package is not an option.
As the user has access to the resources folder for atomic update, I would like to package all theses resources files in a single (.dat, .rom, .whatever)
So the user doesn't mishandle theses data.
Can someone point me to a nice solution to do so?
I don't mind remapping fopen, fread, fseek and fclose in my application, but I would not like starting from scratch (coding the serializer, table of content, parser, etc...). My system is quite limited (no malloc, no framework, just stdlib and FatFs)
Thanks for any input you can give me.
note: I'm not looking for a solution where the resources are embedded IN the code (ROM) as obviously they are way too big for that.
It should be possible to use fatfs recursively.
Drive 0 would be your real device, and drive 1 would be a file on drive 0. You can implement the disk_* functions like this
#define BLOCKSIZE 512
FIL imagefile;
DSTATUS disk_initialize(BYTE drv) {
UINT r;
if(drv == 0)
return SD_initialize();
else if(drv == 1) {
r = f_open(&image, "0:/RESOURCE.DAT", FA_READ);
if(r == FR_OK)
return 0;
}
return STA_NOINIT;
}
DRESULT disk_read(BYTE drv, BYTE *buff, DWORD sector, DWORD count) {
UINT br, r;
if(drv == 0)
return SD_read_blocks(buff, sector, count);
else if(drv == 1) {
r = f_seek(&imagefile, sector*BLOCKSIZE);
if(r != FR_OK)
return RES_ERROR;
r = f_read(&imagefile, buff, count*BLOCKSIZE, &br);
if((r == FR_OK) && (br == count*BLOCKSIZE))
return RES_OK;
}
return RES_ERROR;
}
To create the filesystem image on Linux or other similar systems you'd need mkfs.msdos and the mtools package. See this SO post on how to do it. Might work on Windows with Cygwin, too.
To expand on what Joachim said above:
Popular choices of uncompressed (sometimes) archive formats are cpio, tar, and zip. Any of the 3 would work just fine.
Here are a few more in-depth comments on using TAR or CPIO.
TAR
I've used tar before for the exact purpose, on an stm32 with FatFS, so can tell you it works. I chose it over cpio or zip because of its familiarity (most developers have seen it), ease of use, and rich command line tools.
GNU Tar gives you fine-grained control over order in which the files are placed in the archive and regexes to manipulate file names (--xform) or --exclude paths. You can pretty much guarantee you can get exactly the archive you're after with nothing more than GNU Tar and a makefile. I'm not sure the same can be said for cpio or zip.
This means it worked well for my build environment, but your requirements may vary.
CPIO
The cpio has a much worse/harder to use set of command line tools than tar in my opinion. Which is why I steer clear of it when I can. However, its file format is a little lighter-weight and might be even simpler to parse (not that tar is hard).
The Linux kernel project uses cpio for initramfs images, so that's probably the best / most mature example on the internet that you'll find on using it for this sort of purpose.
If you grab any kernel source tree, the tool usr/gen_init_cpio.c can used to generate a cpio from a cpio listing file format described in that source file.
The extraction code is in init/initramfs.c.
ZIP
I've never used the zip format for this sort of purpose. So no real comment there.
Berendi found a very clever solution: use the existing fat library to access it recursively!
The implementation is quite simple, and after extensive testing, I'd like to post the code to use FatFs recursively and the commands used for single file fat generation.
First, lets generate a 100Mb FAT32 file:
dd if=/dev/zero of=fat.fs bs=1024 count=102400
mkfs.vfat -F 32 -r 112 -S 512 -v fatfile.fs
Create/push content into it:
echo HelloWorld on Virtual FAT >> helloworld.txt
mcopy -i fatfile.fs helloworld.txt ::/
Change the diskio.c file, to add Berendi's code but also:
DSTATUS disk_status ()
{
DSTATUS status = STA_NOINIT;
switch (pdrv)
{
case FATFS_DRIVE_VIRTUAL:
printf("disk_status: FATFS_DRIVE_VIRTUAL\r\n" );
case FATFS_DRIVE_ATA: /* SD CARD */
status = FATFS_SD_SDIO_disk_status();
}
}
Dont forget to add the enum for the drive name, and the number of volumes:
#define _VOLUMES 2
Then mount the virtual FAT, and access it:
f_mount(&VirtualFAT, (TCHAR const*)"1:/", 1);
f_open(&file, "1:/test.txt", FA_READ);
Thanks a lot for your help.

Find the oldest file within a directory in C on Windows

I am working on a C project and I am trying to find the oldest file within a directory so that that once the oldest file has been found, it is then deleted. I can not find anything on how to do this in C using windows, have found ways to do it in Linux but I need a version for Windows.
Basically you scan the directory, same as in Linux (but you could check out the Boost library also).
The data about time and date are already available in the directory scan structure
HANDLE fh;
FILETIME oldest = {-1U, -1U};
// Buffer to hold file name
oldestFile = malloc(MAX_PATH);
fd = malloc(sizeof(WIN32_FIND_DATA));
if (INVALID_HANDLE_VALUE == (fh = FindFirstFile(directory_name, fd)))
// Signal error, free memory, (and return an error code?)
// OK to proceed
do
{
if(fd->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
continue;
//
if ((fd->ftCreationTime.dwHighDateTime < oldest.dwHighDateTime)
|| (fd->ftCreationTime.dwHighDateTime == oldest.dwHighDateTime
&& fd->ftCreationTime.dwLowDateTime < oldest.dwLowDateTime))
{
oldest.dwHighDateTime = fd->ftCreationTime.dwHighDateTime; // ftLastAccessTime? ftLastWriteTime?
oldest.dwLowDateTime = fd ->ft CreationTime.dwLowDateTime;
strncpy(oldestFile, MAX_PATH, fd->cFileName);
}
} while(FindNextFile(fh, fd));
FindClose(fh);
free(fd); fd = NULL;
You'll want to use the FindFirstFile/FindNextFile combination on Windows to get the files in the directory. You can then either use stat as you would in Linux, or GetFileAttributesEx to check the dates.
Since windows is POSIX compliant, you should be able to read a directory and do a stat() on the files.
You could use the GetFileAttributesEx() function which populates a WIN32_FILE_ATTRIBUTE_DATA struct which has three time related members:
ftCreationTime
ftLastAccessTime
ftLastWriteTime
You can compare whichever of these is more relevant and keep track of the oldest file found during iteration. Once iteration is over, delete it using DeleteFile(). The time members are of type FILETIME and can be compared using CompareFileTime().
Or use the GetFileTime() to obtain the relevant time attribute, as commented by BeyondSora.
For finding the details of file in windows you'll have to refer to File Allocation Table which includes all the details about the files.
Check here for the coding part to read FAT

SIC Assembler I/O

I've coded a SIC assembler and everything seems to be working fine except for the I/O aspect of it.
I've loaded the object code into memory (converted char format into machine representation), but when I call SICRun(); to execute the code, I get an error stating "devf1 cannot be found".
I know this is related to the input/output device instructions in the source code.
The c file states that it depends on external files, most notably, Dev[6]. Am I supposed to create this myself? My instructor did not give us any other files to work with. Any insight?
Example: TD OUTPUT ;TEST OUTPUT DEVICE
This directory contains the source code (source.asm), header file (sic.h) and the SIC simulator (sicengine.c)
From the sicengine.c source file it looks as though the devf1 (also dev2/dev3) file is expected to exist so this 'input device' can be read from (fopen is passed "r" as a parameter):
if (opcode == 216) { /* RD */
/* ... */
if ((Dev[Devcode] = fopen(SICFile[Devcode],"r")) == NULL) {
printf("cannot open file %s\n", SICFile[Devcode]);
exit(1);
}
The comment in the code about depending on file Dev[6] is ambiguous. It really means the names of the files in the Dev array, which are devf1, devf2 and devf3 (input devices) and devf04, devf05 and devf05 (output devices).
I would suggest creating files devf1, devf1 and devf3.

Resources