Universal archive unpacker library

Universal archive unpacker library - archive

A lot of antiviruses can unpack most archives, found on users harddrives. They dissect .zip, .rar, .chm, .exe, .msi (and other installers) and a lot lot more. Also they can unpack an executable (get resources from them, unpack packed executable and unpack SFX archives).
For example, the popular old russian antivirus "Drweb" have support for many archive formats:
Dr.Web knows many types of archives. At present they are:
ZIP, 7ZIP, ARJ, RAR, LHA, HA, GZIP, TAR, BZIP2, MS CAB, WISE, MSI, ISO, CPIO, RPM, DEB
Is there a library or utility which allows me to unpack most archives?
Is there a library to unpack an executables like it is done by antiviruses?

An utility/library to unpack a lot of archives is 7-Zip
Supported formats:
Packing / unpacking: 7z, XZ, BZIP2, GZIP, TAR,
ZIP and WIM
Unpacking only: ARJ, CAB,
CHM, CPIO, CramFS, DEB, DMG, FAT, HFS,
ISO, LZH, LZMA, MBR, MSI, NSIS, NTFS,
RAR, RPM, SquashFS, UDF, VHD, WIM, XAR
and Z.
UniExtract also supports the unpacking of executables.
Universal Extractor is a program
designed to decompress and extract
files from any type of archive or
installer, such as ZIP or RAR files,
self-extracting EXE files, application
installers, etc. It's able to support so
many varied file formats by utilizing
the many backend utilities listed in
the credits at the bottom of the page.
Please note that Universal Extractor
is not intended to be a general
purpose archiving program. It cannot
(and never will) create archives, and
therefore cannot fully replace
archivers such as 7-Zip or WinRAR.
What it will do, however, is allow you
to extract files from virtually any
type of archive, regardless of source,
file format, compression method, etc.
The original motivation behind this
project was to create an easy,
convenient way to extract files from
various types of installation packages
without the need to remember arcane
command line switches or track down
separate utilities to handle the
unpacking. Over time, and with the
encouragement of its many users and
the fine folks over on the MSFN forum,
it has evolved into a mature and very
capable unarchiving utility that many,
including myself, find indispensable.

7zip is good but supports only two dozens of file formats:
Packing / unpacking: 7z, XZ, BZIP2, GZIP, TAR, ZIP and WIM
Unpacking only: ARJ, CAB, CHM, CPIO, CramFS, DEB, DMG, FAT, HFS, ISO, LZH, LZMA, MBR, MSI, NSIS, NTFS, RAR, RPM, SquashFS, UDF, VHD, WIM, XAR and Z.
Universal Unpacker supports more (and it uses good TrID binary signature detector):
.7z, .ace, .arc, .arj, .bin, .cue, .bz2, .tar, .cpio, .deb, .cdi, .b64, .uu, .uue, .xx,
.xxe, .yenc, .ntx, .gz, .img, .cab, .1, .lib, .iso, .kgb, kge, .lzh, .lha, .lzma, .lzo, .Z, .tz, .lzx, .mht, .cab, .imf, .chm, .??_, .lit, .hlp .nbh, .nrg, .exe, .dbx, .pea, .rar, .rpm, .sis, .sit, .xz, .uha, UPX, .wim, .msm, .msi, .msp, .msu, .zip, .jar, .zoo
There is PeaZIP open source archiver (LGPL) with support of:
Read (browse, extract, test): 7z, apk, bz, bz2, bzip2, tbz2, tbz, gz, tpz, tar, zip, zipx, z01, smzip, arj, cab, chm, chi, chq, chw, hxs, hxi, hxr, hxq, hxw, lit, cpio, deb, lzh, lha, rar, rpm, z, taz, tz, iso, jar, ear, war, lha, pet, pup, pak, pk3, pk4, slp, [Content], xpi, wim, u3p, lzma86, lzma, udf, xar, dmg, hfs, part1, split, swm, tpz, kmz, xz, txz, vhd, mslz, apm, mbr, fat, ntfs, exe, dll, sys, msi, msp, ods, ots, odm, oth, oxt, odb, odf, odg, otg, odp, otp, odt, ott, gnm, doc, dot, xls, xlt, ppt, pps, pot, docx, dotx, xlsx, xltx, swf, flv, quad, balz, zpaq, paq8f, paq8jd, paq8l, paq8o, lpaq1, lpaq5, lpaq8, ace, arc, wrc, 001, pea, cbz, cbr, cba, cb7, cbt (and more...)

Related

Why would file checksums inconsistently fail?

I created a ~2MiB file.
dd if=/dev/urandom of=file.bin bs=2M count=1
Then I copied that file a large number of times and generated a checksum for each (identical) copy.
for i in `seq 50000`;
do
name="file.${i}.bin"
cp file.bin "${name}"
sha512sum "${name}" > "${name}.sha512"
done
I then verified all of those checksummed files with a validation script to run sha512sum against each file.
for file in `find . -regex ".*\.sha512"`
do
sha512sum --check --quiet "${file}" || (
cat "${file}" && sha512sum "${file%.sha512}"
)
done
I just created these files, and when I validate them moments later, I see intermittent failures and inconsistencies in the data (console text truncated for readability)
will:/mnt/usb $ for file in `find ...
file.5602.bin: FAILED
sha512sum: WARNING: 1 computed checksum did NOT match
91fc201a3812e93ef3d4890 ... file.5602.bin
b176e8e3ea63a223130f3a0 ... ./file.5602.bin
The checksum files are all identical since the source files are all identical
The problem seems to be that my computer is, seemingly at random, generating the wrong checksum for some of my files when I go to validate. A different file fails the checksum every time, and files that previously failed will pass.
will:/mnt/usb $ for file in `find ...
sha512sum: WARNING: 1 computed checksum did NOT match
91fc201a3812e93ef3d4890 ... file.3248.bin
442a1d8805ed134c9ab5252 ... ./file.3248.bin
Keep in mind that all of these files are identical.
I see the same behavior with SATA SSD and HDD, and USB devices, with md5 and sha512, with xfs, btrfs, ext4, and vfat. I tried live booting to another OS. I see this same stranger behavior regardless. I also see rsync --checksum for these files thinks checksums are wrong and re-copies these files even though they have not changed.
What could explain this behavior? Since it's happening on multiple devices with all the scenarios I described, I doubt this is bit rot. My kernel logs show no obvious errors. I would assume this is a hardware issue based on my troubleshooting, but how can this be diagnosed? Is it the CPU, the motherboard, the RAM?

What could explain this behavior? How can this be diagnosed?
From what I've read, a number of issues could explain this behavior. Bad disk(s), bad PSU (power supply), bad RAM, filesystem issues.
I tried the following to determine what was happening. I repeated the experiment with different...
Disks
Types of disks (SDD vs HDD)
External drives (3.5 and 2.5 enclosures)
Flash drives (USB 2 and 3 on various ports)
Filesystems (ext4, vfat (fat32), xfs, btrfs)
Different PSU
Different OS (live boot)
Nothing seemed to resolve this.
Finally, I gave memtest86+ v5.0.1 a try via an Ubuntu live USB.
voila. It found bad memory. Through process of elimination I determined one of my memory sticks was bad, and then tested the other over night to ensure it was in good shape. I re-ran my experiment again and I am seeing consistent checksums on all my files.
What a subtle bug. I only noticed this bad behavior by accident. If I hadn't been messing around with file checksums, I do not think I would have found this bad RAM.
This makes me want to regularly schedule a routine in which I verify and test my RAM. A consequence of this bad memory stick is that some of my test data did end up corrupt, but more often than not, the checksum verifications were just interimmitent failures.
In one sample data pool, all the checksums start with cb2848ca0e1ff27202a309408ec76..., because all ~50,000 files are identical.
Though, there are two files that are corrupt, but this is not bit rot or file integrity damage.
What seems most likely is that these files were created with corruption because cp encountered bad RAM when I created these files. Those files consistently return bad checksums of 58fe24f0e00229e8399dc6668b9... and bd85b51065ce5ec31ad7ebf3..., while the other 49,998 files return the same checksum.
This has been a fun extremely frustrating experiment in debugging.

Alpine APK Package Repositories, how are the checksums calculated?

I'm trying to work out how the pull checksum for packages is calculated within Alpine APK package repositories. The documentation regarding the format is lacking in any detail.
When I run apk index -o APKINDEX.unsigned.tar.gz *.apk which generates the repository. When you extract the txt file from inside the gz, it contains the following...
C:Q17KXT6xFVWz4EZDIbkcvXQ/uz9ys=
P:redis-server
V:3.2.3-0
A:noarch
S:2784844
I:102400
T:An advanced key-value store
U:http://redis.io/
L:
D:linux-headers
I'm interested in how the very first line is generated. I've tried to read the actual source that's used to generate this, but I'm not a C programmer, so it's hard for me to comprehend as it jumps all over the place.
The two files mentioned in the documentation are database.c and package.c.
Incase this somewhat helps, the original APK file has these various hashes...
CRC32 = ac17ea88
MD5 = a035ecf940a67a6572ff40afad4f396a
SHA1 = eca5d3eb11555b3e0464321b91cbd743fbb3f72b
SHA256 = 24bc1f03409b0856d84758d6d44b2f04737bbc260815c525581258a5b4bf6df4

The pull checksum is the sha1sum of the second tar.gz file in the apk file, containing the .PKGINFO file.
The Alpine APK package is actually a concatenation in disguise of 3 tar.gz files.
We can split the package below using gunzip-split into 3 .gz files, then rename them to .tar.gz
./gunzip-split -s -o ./out/ strace-5.14-r0.apk
mv ./out/file_1.gz ./out/file_1.tar.gz
mv ./out/file_2.gz ./out/file_2.tar.gz
mv ./out/file_3.gz ./out/file_3.tar.gz
sha1sum ./out/file_2.tar.gz
7a266425df7bfd7ce9a42c71a015ea2ae5715838 out/file_2.tar.gz
tar tvf out/file_2.tar.gz
-rw-r--r-- root/root 702 2021-09-03 01:34 .PKGINFO
In the case of the strace package the checksum value can be derived as above:
apk index strace-5.14-r0.apk -o APKINDEX.tar.gz
tar xvf APKINDEX.tar.gz
cat APKINDEX
echo eiZkJd97/XzppCxxoBXqKuVxWDg=|base64 -d|xxd
00000000: 7a26 6425 df7b fd7c e9a4 2c71 a015 ea2a z&d%.{.|..,q...*
00000010: e571 5838 .qX8
When comparing them we see that they match.
References
https://github.com/martencassel/apk-tools/blob/master/README.md
https://gitlab.com/cg909/gunzip-split/-/releases
https://lists.alpinelinux.org/~alpine/devel/%3C257B6969-21FD-4D51-A8EC-95CB95CEF365%40ferrisellis.com%3E#%3C20180309152107.472e4144#vostro.util.wtbts.net%3E

So...
/* Internal cointainer for MD5 or SHA1 */
struct apk_checksum {
unsigned char data[20];
unsigned char type;
};
Basically take the C: value then chop off the Q from the front then base 64 decode. Chop off the last value (type which defaults to SHA1) then you have your sha1. This appears to be made of the CONTENTS of the package but that would take further looking into it.

You need to look here: https://git.alpinelinux.org/cgit/apk-tools/tree/src/blob.c#n492
It is apk_blob_pull_csum
First 'Q' stands for encoding
Next '1' stands for SHA1
Looks like this checksum is made database.c in apk_db_unpack_pkg:
apk_sign_ctx_init(&ctx.sctx, APK_SIGN_VERIFY_IDENTITY, &pkg->csum, db->keys_fd);
tar = apk_bstream_gunzip_mpart(bs, apk_sign_ctx_mpart_cb, &ctx.sctx);
r = apk_tar_parse(tar, apk_db_install_archive_entry, &ctx, TRUE, &db->id_cache);
but I'm not sure, because I failed to trace this code.
It is really not easy to understand what are they doing.

External file ressource on embedded system (C language with FAT)

My application/device is running on an ARM Cortex M3 (STM32), without OS but with a FatFs) and needs to access many resources files (audio, image, etc..)
The code runs from internal flash (ROM, 256Kb).
The resources files are stored on external flash (SD card, 4Gb).
There is not much RAM (32Kb), so malloc a complete file from package is not an option.
As the user has access to the resources folder for atomic update, I would like to package all theses resources files in a single (.dat, .rom, .whatever)
So the user doesn't mishandle theses data.
Can someone point me to a nice solution to do so?
I don't mind remapping fopen, fread, fseek and fclose in my application, but I would not like starting from scratch (coding the serializer, table of content, parser, etc...). My system is quite limited (no malloc, no framework, just stdlib and FatFs)
Thanks for any input you can give me.
note: I'm not looking for a solution where the resources are embedded IN the code (ROM) as obviously they are way too big for that.

It should be possible to use fatfs recursively.
Drive 0 would be your real device, and drive 1 would be a file on drive 0. You can implement the disk_* functions like this
#define BLOCKSIZE 512
FIL imagefile;
DSTATUS disk_initialize(BYTE drv) {
UINT r;
if(drv == 0)
return SD_initialize();
else if(drv == 1) {
r = f_open(&image, "0:/RESOURCE.DAT", FA_READ);
if(r == FR_OK)
return 0;
}
return STA_NOINIT;
}
DRESULT disk_read(BYTE drv, BYTE *buff, DWORD sector, DWORD count) {
UINT br, r;
if(drv == 0)
return SD_read_blocks(buff, sector, count);
else if(drv == 1) {
r = f_seek(&imagefile, sector*BLOCKSIZE);
if(r != FR_OK)
return RES_ERROR;
r = f_read(&imagefile, buff, count*BLOCKSIZE, &br);
if((r == FR_OK) && (br == count*BLOCKSIZE))
return RES_OK;
}
return RES_ERROR;
}
To create the filesystem image on Linux or other similar systems you'd need mkfs.msdos and the mtools package. See this SO post on how to do it. Might work on Windows with Cygwin, too.

To expand on what Joachim said above:
Popular choices of uncompressed (sometimes) archive formats are cpio, tar, and zip. Any of the 3 would work just fine.
Here are a few more in-depth comments on using TAR or CPIO.
TAR
I've used tar before for the exact purpose, on an stm32 with FatFS, so can tell you it works. I chose it over cpio or zip because of its familiarity (most developers have seen it), ease of use, and rich command line tools.
GNU Tar gives you fine-grained control over order in which the files are placed in the archive and regexes to manipulate file names (--xform) or --exclude paths. You can pretty much guarantee you can get exactly the archive you're after with nothing more than GNU Tar and a makefile. I'm not sure the same can be said for cpio or zip.
This means it worked well for my build environment, but your requirements may vary.
CPIO
The cpio has a much worse/harder to use set of command line tools than tar in my opinion. Which is why I steer clear of it when I can. However, its file format is a little lighter-weight and might be even simpler to parse (not that tar is hard).
The Linux kernel project uses cpio for initramfs images, so that's probably the best / most mature example on the internet that you'll find on using it for this sort of purpose.
If you grab any kernel source tree, the tool usr/gen_init_cpio.c can used to generate a cpio from a cpio listing file format described in that source file.
The extraction code is in init/initramfs.c.
ZIP
I've never used the zip format for this sort of purpose. So no real comment there.

Berendi found a very clever solution: use the existing fat library to access it recursively!
The implementation is quite simple, and after extensive testing, I'd like to post the code to use FatFs recursively and the commands used for single file fat generation.
First, lets generate a 100Mb FAT32 file:
dd if=/dev/zero of=fat.fs bs=1024 count=102400
mkfs.vfat -F 32 -r 112 -S 512 -v fatfile.fs
Create/push content into it:
echo HelloWorld on Virtual FAT >> helloworld.txt
mcopy -i fatfile.fs helloworld.txt ::/
Change the diskio.c file, to add Berendi's code but also:
DSTATUS disk_status ()
{
DSTATUS status = STA_NOINIT;
switch (pdrv)
{
case FATFS_DRIVE_VIRTUAL:
printf("disk_status: FATFS_DRIVE_VIRTUAL\r\n" );
case FATFS_DRIVE_ATA: /* SD CARD */
status = FATFS_SD_SDIO_disk_status();
}
}
Dont forget to add the enum for the drive name, and the number of volumes:
#define _VOLUMES 2
Then mount the virtual FAT, and access it:
f_mount(&VirtualFAT, (TCHAR const*)"1:/", 1);
f_open(&file, "1:/test.txt", FA_READ);
Thanks a lot for your help.

Reverse enginering a Linux based USB camera

I bought an IP camera on which is installed proprietary software (no HTTP server).
This prevents me to integrate it into my home network.
I want to replace the software (ELF closed source) by the motion package I already use and add some features.
I have no particular system competence and it's been over a week since I travel the net to learn but I can not get out.
I have access to the U-boot console (USB-TTL adapter) and telnet (root).
The webcam has a SD card reader that I could use if I need space.
I started by making a backup of the three partitions (with dd).
I unzipped the file mtdblock2 (binwalk -e). Which generates a classical Linux tree with links to Busybox, some binary system and proprietary software.
I tried to unzip mtdblock1 which generates zImage.
The decompression zImage generates two directories and one file (console).
Yet I need the kernel modules that are in it. What to do?
I also want to get the kernel compilation settings, is this possible?
I unpacked the firmware available on the manufacturer's website.
It contains only updating the ELF, one .so file and some Bash scripts.
At first I thought the three partitions directly migrate to Qemu.
But if I understand this is not possible because the memory addresses are hard-coded into the kernel.
I understand good?
So I think I have one solution: build a new kernel and rebuild a rootfs from scratch.
Is this only solution?
I started playing with Buildroot but I can not find the configuration file for board based on Hisilicon Hi3518.
I looked bad or is it useless?
For my first test I used board/qemu/arm-versatile. This is the right choice?
This will not prevent me from migrating to the physical machine?
For testing, if I managed to rebuild a kernel and rootfs I would install these partitions on the SD not to break anything.
For this, it is "sufficient" to modify kernel parameters (in bootargs variable) is that right?
So I don't need to rebuild a U-boat partition for my device?
In short, you guessed I ask myself a lot of questions (yet others but "one" thing at a time).
I need advice about whether I take the right road.
Please, if I am talking nonsense feel free to correct me.
If you have ideas or subjects of reflection I'm interested.
# cat /proc/cpuinfo
Processor : ARM926EJ-S rev 5 (v5l)
BogoMIPS : 218.72
Features : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant : 0x0
CPU part : 0x926
CPU revision : 5
Hardware : hi3518
Revision : 0000
Serial : 0000000000000000
# cat /proc/mtd
dev: size erasesize name
mtd0: 00100000 00010000 "boot"
mtd1: 00300000 00010000 "kernel"
mtd2: 00c00000 00010000 "rootfs"
# binwalk mtdblock0
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
122044 0x1DCBC CRC32 polynomial table, little endian
# binwalk mtdblock1
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 uImage header, header size: 64 bytes, header CRC: 0x853F419E, created: 2014-07-22 02:45:04, image size: 2890840 bytes, Data Address: 0x80008000, Entry Point: 0x80008000, data CRC: 0xB24E77CA, OS: Linux, CPU: ARM, image type: OS Kernel Image, compression type: none, image name: "Linux-3.0.8"
22608 0x5850 gzip compressed data, maximum compression, from Unix, NULL date:
# binwalk zImage
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
113732 0x1BC44 ASCII cpio archive (SVR4 with no CRC), file name: "dev", file name length: "0x00000004", file size: "0x00000000"
113848 0x1BCB8 ASCII cpio archive (SVR4 with no CRC), file name: "dev/console", file name length: "0x0000000C", file size: "0x00000000"
113972 0x1BD34 ASCII cpio archive (SVR4 with no CRC), file name: "root", file name length: "0x00000005", file size: "0x00000000"
114088 0x1BDA8 ASCII cpio archive (SVR4 with no CRC), file name: "TRAILER!!!", file name length: "0x0000000B", file size: "0x00000000"
1903753 0x1D0C89 Certificate in DER format (x509 v3), header length: 4, sequence length: 1284
4188800 0x3FEA80 Linux kernel version "3.0.8 (cwen#ubuntu) (gcc version 4.4.1 (Hisilicon_v100(gcc4.4-290+uclibc_0.9.32.1+eabi+linuxpthread)) ) #1 Tue Jul 22 10:45:00 H"
4403540 0x433154 CRC32 polynomial table, little endian
5053435 0x4D1BFB Unix path: /mtd/devices/hisfc350/hisfc350_spi_gd25qxxx.c
5054731 0x4D210B Unix path: /mtd/devices/hisfc350/hisfc350.c
5058939 0x4D317B Unix path: /net/wireless/rt2x00/rt2x00dev.c
5059323 0x4D32FB Unix path: /net/wireless/rt2x00/rt2x00config.c
5060683 0x4D384B Unix path: /net/wireless/rt2x00/rt2x00usb.c
5060851 0x4D38F3 Unix path: /net/wireless/rt2x00/rt2x00.h
5061171 0x4D3A33 Unix path: /net/wireless/rt2x00/rt73usb.c
5081107 0x4D8813 Unix path: /S70/S75/505V/F505/F707/F717/P8
5102399 0x4DDB3F Unix path: /mmc/host/himciv100/himci.c
5141264 0x4E7310 Neighborly text, "NeighborSolicits/ipv6/inet6_hashtables.c"
5141284 0x4E7324 Neighborly text, "NeighborAdvertisementses.c"
# binwalk mtdblock2
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 JFFS2 filesystem, little endian
722980 0xB0824 JFFS2 filesystem, little endian
732282 0xB2C7A Zlib compressed data, compressed
737031 0xB3F07 Zlib compressed data, compressed
738287 0xB43EF Zlib compressed data, compressed
.... most other lines in the same genre
IP Camera QQZM N5063
http://www.zmvideo.com/product/detail.php?id=60
Firmware
http://bbs.zmmcu.com/forum.php?mod=attachment&aid=MzU2fDBiY2M4NDdjfDE0MTkxMTEzODl8MzQ4fDIwMzc%3D

First of all, you do not want to replace U-Boot as this may render your device unbootable. On the U-Boot console, check if you can boot from the SD card mmc rescan 0; fatload mmc 0 ${loadaddr} uImage or from the network dhcp ${loadaddr} ${serverip}:uImage. You'll need to look for documentation for these commands to get more help.
But perhaps you don't even need to replace the kernel. You already know it's a 3.0.8 kernel, so you can build a userspace for this kernel version. And any proprietary modules that are used by it can be lifted from the jffs2 filesystem. On your telnet session, do lsmod to find out which modules are loaded. You can mount an SD card and copy them to it. The modules are located in /lib/modules/3.0.8.
So you probably don't even need to build a kernel in buildroot, only the rootfs. First, check in the telnet session which filesystems are supported: cat /proc/filesystems. Then choose the appropriate filesystem in the buildroot configuration. For the target architecture, choose arm926t. And select the 3.0 kernel headers in the toolchain configuration, or choose the Arago ARMv5 2011.09 external toolchain (it has old kernel headers).
As remarked by artless noise, you don't need to test it in qemu, since the SD card is safe.

Linux programming: which device a file is in

I would like to know which entry under /dev a file is in. For example, if /dev/sdc1 is mounted under /media/disk, and I ask for /media/disk/foo.txt, I would like to get /dev/sdc as response.
Using stat system call on that file I will get its partition major and minor numbers (8 and 33, for sdc1). Now I need to get the "root" device (sdc) or its major/minor from that. Is there any syscall or library function I could use to link a partition to its main device? Or even better, to get that device directly from the file?
brw-rw---- 1 root floppy 8, 32 2011-04-01 20:00 /dev/sdc
brw-rw---- 1 root floppy 8, 33 2011-04-01 20:00 /dev/sdc1
Thanks in advance!

The quick and dirty version: df $file | awk 'NR == 2 {print $1}'.
Programmatically... well, there's a reason I started with the quick and dirty version. There's no portable way to programmatically get the list of mounted filesystems. (getmntent() gets fstab entries, which is not the same thing.) Moreover, you can't even parse the output of mount(8) reliably; on different Unixes, the mountpoint may be the first or the last item. The most portable way to do this ends up being... parsing df output (And even that is iffy, as you noticed with the partition number.). So you're right back to the quick and dirty shell solution anyway, unless you want to traverse /dev and look for block devices with matching major(st_rdev) (major() being from sys/types.h).
If you restrict this to Linux, you can use /proc/mounts to get the list of mounted filesystems. Other specific Unixes can similarly be optimized: for example, on OS X and I think FreeBSD, you can use sysctl() on the vfs tree to get mountpoints. At worst you can find and use the appropriate header file to decipher whatever the mount table file is (and yes, even that varies: on Solaris it's /etc/mnttab, on many other systems it's /etc/mtab, some systems put it in /var/run instead of /etc, and on many Linuxes it's either nonexistent or a symlink to /proc/mounts). And its format is different on pretty much every Unix-like OS.

The information you want exists in sysfs which exposes the linux device tree. This models the relationships between the devices on the system and since you are trying to determine a parent disk device from a partition, this is the place to look. I don't know if there are any hard and fast rules you can rely on to stop your code breaking with future versions of the kernel, but the kernel developers do try to maintain sysfs as a stable interface.
If you look at /sys/dev/block/<major>:<minor>, you'll see it is a symlink with the tail components being block/<disk-device-name>/<partition-device-name>. If you were to perform a readlink(2) system call on that, you could parse the link destination to get the disk device name. In shell (since it's easier to express this way, but doing it in C will be pretty easy):
$ echo $(basename $(dirname $(readlink /sys/dev/block/8:33)))
sdc
Alternatively, you could take advantage of the nesting of partition directories in the disk directories (again in shell, but from C, its an open(2), read(2), and close(2)):
$ cat /sys/dev/block/8:33/../dev
8:32
That assumes your starting major:minor is actually for a partition, not some other sort of non-nested device.

What you looking for is impossible - there is no 1:1 connection between a block device file and the partition it is describing.
Consider:
You can create multiple block device files with different names (but the same major and minor numbers) and they are indistinguishable (N:1)
You can use a block device file as an argument to mount to mount a partition and then delete the block device file leaving the partition mounted. (0:1)
So there is no way to do what you want except in a few specific and narrow cases.

Major number will tell you which device it is: 3 - IDE on 1st controller, 22 - IDE on 2nd controller and 8 for SCSI.
Minor number will tell you partition number and - for IDE devices - if it's primary or secondary drive. This calculation is different for IDE and SCSI.
For IDE it is: x*64 + p, x is drive number on the controller (0 or 1) and p is partition
For SCSI it is: y*16 + p, where y is drive number and p is partition

Not a syscall, but:
df -h /path/to/my/file
From https://unix.stackexchange.com/questions/128471/determine-what-device-a-directory-is-located-on
So you could look at df's source code and see what it does.
I realize this post is old, but this question was the 2nd result in my search and no one has mentioned df -h

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight