Detecting file MIME in C

Detecting file MIME in C - c

I have files with wrong extensions, and try to find the correct MIME in a C script.
For a PDF file with txt extension, magic (#include <magic.h>)
const char *mime;
magic_t magic;
magic = magic_open(MAGIC_MIME_TYPE);
magic_load(magic, NULL);
magic_compile(magic, NULL);
mime = magic_file(magic, filename);
printf("%s\n", mime);
magic_close(magic);
returned
application/octet-stream
which is not very helpful.
GIO 2.0 (#include <gio/gio.h>)
char *content_type = g_content_type_guess (file_name, NULL, 0, &is_certain);
if (content_type != NULL)
{
char *mime_type = g_content_type_get_mime_type (content_type);
g_print ("Content type for file '%s': %s (certain: %s)\n"
"MIME type for content type: %s\n",
file_name,
content_type,
is_certain ? "yes" : "no",
mime_type);
g_free (mime_type);
}
returned
Content type for file 'test.txt': text/plain (certain: no)
MIME type for content type: text/plain
However, file command in Linux returns the correct MIME:
file test.txt
test.txt: PDF document, version 1.6
This should not be the expected behaviors of these well-established libraries in C. What do I do wrong?

It is true, that file utility is base on top of libmagic, but what really determines returned values is flags provided to libmagic_open (or appropriate set functions) and used database of MIME types.
Library provides means to use pre-compiled database and raw database (has to be compiled by calling libmagic_compile), which is your case. Documentation defines default dabase files when called using NULL parameter as a /usr/local/share/misc/magic for raw database (on debian directory link from /usr/share/misc/magic to ../file/magic/, and is empty) and magic.mgs in same parent directory.
Compiled library is by default placed into working directory and on my debian system seams to be empty (confirmed by default directory of database data being empty). After realizing this, I tried your example with magic_compile removed and it seams to improve things significantly.

Related

readdir is returning lots of all-0xFF 8.3 names from fat32 filesystem

I'm working on an ESP32 platform with IDF 4.1, using code like this:
struct dirent * dirent;
while((dirent = readdir(dir)) != nullptr) {
ESP_LOGI("ConfigServer", "Found %s, id %d, type %d", dirent->d_name, dirent->d_ino, dirent->d_type);
if(dirent->d_name[0] == '\377') {
++invalid_ctr;
} else {
// do something with the file info
}
}
closedir(dir);
I had to add the bit where invalid_ctr is incremented, because I started getting loads of iterations where dirent->d_name was "\377\377\377\377\377\377\377\377.\377\377\377" (rendered as inverse-video "?" characters in my terminal). The code not shown involves feeding that name to stat(), which would return the same values as the last valid file encountered. The log entry would look like this:
I (608261) ConfigServer: Found ��������.���, id 0, type 2
Type 2 represents a directory. This is happening on a partition on the onboard flash, formatted by the IDF library's "format if mount failed" option at mount. So perhaps my assumption of FAT32 is invalid. I do know that IDF uses FatFs internally.
Is this indicative of an error on the filesystem? Is it expected to need to filter out such trash on a typical iteration with readdir()?

The FAT component in ESP IDF has support for long file names disabled by default. Run idf.py menuconfig, then "Component config → FAT Filesystem support → Long filename support" to enable it.

SHCreateItemFromParsingName return FILE_NOT_FOUND when filename specified

I try get IShellItem for a file to copy it with IFileOperation COM interface from system directory to another directory. I must use exactly IFileOperation COM interface for this purpose.
When I specify full filename - return value from SHCreateItemFromParsingName() was ERROR_FILE_NOT_FOUND, but file present in the directory. When I delete filename from path below and use only folder path - all seems good, return value is S_OK.
//...
CoInitialize(NULL);
//...
WCHAR szSourceDll[MAX_PATH * 2];
wcscpy_s(szSourceDll, MAX_PATH, L"C:\\Windows\\System32\\sysprep\\cryptbase.dll");
r = CoCreateInstance(&CLSID_FileOperation, NULL, CLSCTX_INPROC_SERVER | CLSCTX_LOCAL_SERVER | CLSCTX_INPROC_HANDLER, &IID_IFileOperation, &FileOperation1);
if (r != S_OK) return;
FileOperation1->lpVtbl->SetOperationFlags(FileOperation1, FOF_NOCONFIRMATION | FOFX_NOCOPYHOOKS | FOFX_REQUIREELEVATION);
r = SHCreateItemFromParsingName(szSourceDll, NULL, &IID_IShellItem, &isrc);
//...
CoUninitialize();
//...
Why this code, written in C, not working with filenames. How can I create IShellItem instance for file in system folder to copy it?
P.S.
Windows 7 x64, C, Visual Studio 2015, v140 platform toolset, additional dependencies: Msi.lib;Wuguid.lib;ole32.lib;ntdll.lib
P.P.S
It's properly work with files in user`s directories...

Assuming your application is compiled as a 32-bit application and running on a 64-bit OS, a file not found error is probably correct because your application is redirected to the 32-bit system directory (%WinDir%\SysWoW64).
In most cases, whenever a 32-bit application attempts to access %windir%\System32, %windir%\lastgood\system32, or %windir%\regedit.exe, the access is redirected to an architecture-specific path.
For more information, see File System Redirector on MSDN.
You could temporarily turn off redirection in your thread but it is not really safe to do this when calling shell functions, only functions in kernel32. If the API you are calling internally uses LoadLibrary and/or COM then the API might fail because it will be unable to load from system32 while redirection is disabled.
You can also access the native system32 directory with the %WinDir%\SysNative backdoor. This only works in 32-bit applications on 64-bit Vista+ so you must do some version detection.

Writing my own HTTP Server - How to find relative path of a file

I'm currently writing an HTTP Server over UNIX Sockets in C, and I'm about to implement the part of the GET request that checks the requested file to make sure it has appropriate permissions.
Before I knew anything about HTTP servers, I set up an Apache server, and it is my understanding that there is a single directory which the HTTP server looks to find a requested file. I do not know if this is because the server somehow has no permissions outside of the directory, or if it actually validates the path to ensure it is inside the directory.
Now that I am about to implement this on my own, I'm not sure how to properly handle this. Is there a function in C that will allow me to determine if a path is inside a given directory (e.g. is foo/bar/../../baz inside foo/)?
In python, I would use os.path.relpath and check if the result starts with .., to ensure that the path is not outside the given directory.
For example, if the directory is /foo/bar/htdocs, and the given path is index.html/../../passwords.txt, I want ../passwords.txt, so I can see from the leading .. that the file is outside the /foo/bar/htdocs directory.

You'd be surprised how much of Python's I/O functionality more or less maps directly to what POSIX can do. :)
In other words, look up realpath().
It's awesome when POSIX has the more descriptive name for a function, with that extra letter included! :)

How to get the absolute path for a given relative path programmatically in Linux?
#include <stdlib.h>
#include <stdio.h>
int main()
{
char resolved_path[100];
realpath("../../", resolved_path);
printf("\n%s\n",resolved_path);
return 0;
}
You can try that. As the same ser (unwind) answered there.

The way it works is much simpler: once the server receives a request, it ONLY looks at its htdoc (static contents) directory to check if the requested resource exists:
char *htdoc = "/opt/server/htdoc"; // here a sub-directory of the server program
char *request = "GET /index.html"; // the client request
char *req_path = strchr(request, ' ') + 1; // the URI path
char filepath[512]; // build the file-system path
snprintf(filepath, sizeof(filepath) - 1, "%s/%s", htdos, req_path);
FILE *f = fopen(filepath, "r"); // try to open the file
...
Note that this code is unsafe because it does not check if the request ventures in the file system by containing "../" patterns (and other tricks). You should also use stat() to make sure that the file is a regular file and that the server has permissions to read it.

As a simple (but incomplete) solution, I just decided to write a bit of code to check the file path for any ...
int is_valid_fname(char *fname) {
char *it = fname;
while(TRUE) {
if (strncmp(it, "..", 2) == 0) {
return FALSE;
}
it = strchr(it, '/');
if (it == NULL) break;
it++;
}
return TRUE;
}

Unzip a zip file using zlib

I have an archive.zip which contains two crypted ".txt" files. I would like to decompress the archive in order to retrieve those 2 files.
Here's what I've done so far:
FILE *FileIn = fopen("./archive.zip", "rb");
if (FileIn)
printf("file opened\n");
else
printf("unable to open file\n");
fseek(FileIn, 0, SEEK_END);
unsigned long FileInSize = ftell(FileIn);
printf("size of input compressed file : %u\n", FileInSize);
void *CompDataBuff = malloc(FileInSize);
void *UnCompDataBuff = NULL;
int fd = open ("archive.zip", O_RDONLY);
CompDataBuff = mmap(NULL, FileInSize, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
printf("buffer read : %s\n", (char *)CompDataBuff);
uLongf UnCompSize = (FileInSize * 11/10 + 12);
UnCompDataBuff = malloc(UnCompSize);
int ret_uncp ;
ret_uncp = uncompress((Bytef*)UnCompDataBuff, &UnCompSize, (const Bytef*)CompDataBuff,FileInSize);
printf("size of uncompressed data : %u\n", UnCompSize);
if (ret_uncp == Z_OK){
printf("uncompression ok\n");
printf("uncompressed data : %s\n",(char *)UnCompDataBuff);
}
if (ret_uncp == Z_MEM_ERROR)
printf("uncompression memory error\n");
if (ret_uncp == Z_BUF_ERROR)
printf("uncompression buffer error\n");
if (ret_uncp == Z_DATA_ERROR)
printf("uncompression data error\n");
I always get "uncompression data error" and I don't know why. And then I would like to know how to retrieve the 2 files with my data uncompressed.

zip is a file format that wraps header and trailer information around compressed data streams in order to represent a set of files and directories. The compressed data streams are almost always deflate data streams, which can in fact be generated and decoded by zlib. zlib also provides the crc32 function which can be used to generate and check the crc values in the zip wrapper information.
What zlib does not do by itself is decode and deconstruct the zip structure. You can either write your own code to do that using the specification (not very hard to do), or you can use the minizip routines in the contrib/minizip directory of the zlib distribution, which provides functions to open, access, and close zip files.

Zlib is not a library for handling .zip files. It supports decompressing zlib and gzip streams, both of which work on the level of a single stream of data, rather than an "archive" format like .zip.
You would need a different library (for one example, libzip; there are many others) to open and manipulate .zip archives.

As mentioned, zlib only handles compression, it doesn't archive. When you want to zip or unzip what you are doing is extracting files from an archive which happens to be in a zip format (there are other formats like rar, 7zip and so on)
If you want to create zips or unzip files you have to handle the zip format and minizip is a nice library, robust and has been there for quite a long time.
There is a contrib for minizip https://github.com/nmoinvaz/minizip with examples on how to use it. Is not that hard, and you can check the minizip.c and miniunz.c for code on how to use it. (Minizip uses zlib for the compression)
Also i ended up building a library that wraps minizip and adds a bunch of nice features to it and makes it easier to use and more object oriented. Lets you do things like zip entire folders, streams, vectors, etc. As well as doing everything entirely in memory.
Repo with examples here: https://github.com/sebastiandev/zipper
Beta pre-release: https://github.com/sebastiandev/zipper/releases/
Code looks something like:
Zipper zipper("ziptest.zip");
zipper.add("somefile.txt");
zipper.add("myFolder");
zipper.close();

if you are using C++ try this example
its call the unzip source
https://github.com/fatalfeel/proton_sdk_source/blob/master/shared/FileSystem/FileSystemZip.cpp
https://github.com/fatalfeel/proton_sdk_source/tree/master/shared/util/unzip

SIC Assembler I/O

I've coded a SIC assembler and everything seems to be working fine except for the I/O aspect of it.
I've loaded the object code into memory (converted char format into machine representation), but when I call SICRun(); to execute the code, I get an error stating "devf1 cannot be found".
I know this is related to the input/output device instructions in the source code.
The c file states that it depends on external files, most notably, Dev[6]. Am I supposed to create this myself? My instructor did not give us any other files to work with. Any insight?
Example: TD OUTPUT ;TEST OUTPUT DEVICE
This directory contains the source code (source.asm), header file (sic.h) and the SIC simulator (sicengine.c)

From the sicengine.c source file it looks as though the devf1 (also dev2/dev3) file is expected to exist so this 'input device' can be read from (fopen is passed "r" as a parameter):
if (opcode == 216) { /* RD */
/* ... */
if ((Dev[Devcode] = fopen(SICFile[Devcode],"r")) == NULL) {
printf("cannot open file %s\n", SICFile[Devcode]);
exit(1);
}
The comment in the code about depending on file Dev[6] is ambiguous. It really means the names of the files in the Dev array, which are devf1, devf2 and devf3 (input devices) and devf04, devf05 and devf05 (output devices).
I would suggest creating files devf1, devf1 and devf3.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Detecting file MIME in C - c

Related

readdir is returning lots of all-0xFF 8.3 names from fat32 filesystem

SHCreateItemFromParsingName return FILE_NOT_FOUND when filename specified

Writing my own HTTP Server - How to find relative path of a file

Unzip a zip file using zlib

SIC Assembler I/O

Categories

Resources