Compress a file chunk by chunk - miniz - c

I'm building a program with help of the library [miniz][1] for compressing files with sizes up to 3GB. The computer that will run this program will also run another (heavy) application and therefore I want this compressing program to load a chunk of each file to prevent it to use a lot of RAM (max size of chunk = 0.5 GB ), compressing that chunk and then proceed the next chunk until all files are compressed.
Right know, this does not work as I want, for example: if a file named problem.txt is divided into 10 chunks, I get 10 files named problem.txt in my zip folder. Obviously I want the chunks to be merged together instead of being splitted in the zip.
Is this possible to do with miniz?
The following text is written in the libary file(the libary contains only one file) so I guess it is not possible but I ask anyway to see if anyone has a solution or another approach so the program does not eat all the memory.
The ZIP archive API's where designed with simplicity and efficiency in mind, with just enough abstraction to
get the job done with minimal fuss. There are simple API's to retrieve file information, read files from
existing archives, create new archives, append new files to existing archives, or clone archive data from
one archive to another. It supports archives located in memory or the heap, on disk (using stdio.h),
or you can specify custom file read/write callbacks.
The program crash with files larger then 0.9 GB.
[1]: https://code.google.com/p/miniz/ .
Please note that the program store the whole file in the std::vector filesdata. Each element is a chunk of data. In the final version, just a chunk shall be read and stored in the program at one time. The problem in this version is that the library creates many files with the same name in the .zip as described above.
Do I use the lib wrongly right now? I open the files myself and store the data in the vector because I could not figure out how to make function open the file itself.
for (i = 0; i < filesData.size(); ++i)
{
sprintf(data.at(i), filesData.at(i) );
sprintf(archive_filename, fileNames.at(i) );
// Add a new file to the archive. Note this is an IN-PLACE operation, so if it fails your archive is probably hosed (its central directory may not be complete) but it should be recoverable using zip -F or -FF. So use caution with this guy.
// A more robust way to add a file to an archive would be to read it into memory, perform the operation, then write a new archive out to a temp file and then delete/rename the files.
// Or, write a new archive to disk to a temp file, then delete/rename the files. For this test this API is fine.
status = mz_zip_add_mem_to_archive_file_in_place(s_Test_archive_filename, archive_filename, data.at(i), strlen(data.at(i)) + 1, s_pComment, (uint16)strlen(s_pComment), MZ_BEST_COMPRESSION);
if (!status)
{
printf("mz_zip_add_mem_to_archive_file_in_place failed! 2\n");
return EXIT_FAILURE;
}
}
I made a small test program(which fails).
#include "miniz.c"
#if defined(__GNUC__)
// Ensure we get the 64-bit variants of the CRT's file I/O calls
#ifndef _FILE_OFFSET_BITS
#define _FILE_OFFSET_BITS 64
#endif
#ifndef _LARGEFILE64_SOURCE
#define _LARGEFILE64_SOURCE 1
#endif
#endif
typedef unsigned char uint8;
typedef unsigned short uint16;
typedef unsigned int uint;
int main(int argc, char *argv[])
{
mz_zip_archive zip_archive;
const char *s_Test_archive_filename = "__mz_example2_test__.zip";
const char *s_pComment = "This is a comment";
remove(s_Test_archive_filename);
printf (argv[1] );
bool status = mz_zip_writer_add_file( (&zip_archive), s_Test_archive_filename, argv[1], s_pComment, (uint16)strlen(s_pComment), MZ_BEST_COMPRESSION );
if (!status)
{
printf("mz_zip_reader_init_file() failed!\n");
return EXIT_FAILURE;
}
else
{
printf("success\n");
}
return 0;
}

Related

I was implementing a file system in C and I got some errors below help me. I have pasted my question code and error below

question:
ou will develop and implement a small file system (“FS”). It is similar to some
of the basics of Unix as well as CP/M file systems.
Your file system will not be part of an operating system, but, similar to most
modern file systems, it will run in several different operating systems to
provide a “portable” file system.
Details:
Your FS will use a file (for example “disk01”), rather than directly using
a physical flash or disk, to store data.
You may have several disk-like files (for example: disk01, disk02),
used to store data.
The data stored in disk01 may be a user’s programs, text files,
other data files, or any type of binary information. In addition to the data
stored, your FS will need to store other, meta-information, such as free space
(blocks), directory details, and possibly other information. The FS directory is flat
(one level) fixed sized, has a user name associated with each file, and has fixed
sized “blocks” (entries).
You should use fixed size blocks (similar to disk blocks) of size 256 (or 1KB,
your choice, examples based on 256) bytes to store files and all meta-data
in your “disk”.
(Your “disk” (for example “disk01” is logically divided into a number of “sectors”,
which are fixed size blocks. Everything that is stored (persistent) is in these
blocks)
Your program (the “FS” executable) should provide the following operations:
(These operations deal with the “disk”, not individual user files)
Createfs #ofblocks – creates a filesystem (disk) with #ofblocks size, each 256 bytes
For example Createfs 250 creates a “virtual disk”, a file that will be initialized to
250 blocks of 256 bytes each. It is created in memory, and initialized.
Formatfs #filenames #DABPTentries
For example Formatfs 64 48 reserves space for 64 file names and 48 file meta data,
Note that some file names may “point” to the same file metadata, in this example
there can only be 48 unique files.
Savefs name– save the “disk” image in a file “name”
Openfs name- use an existing disk image
For example Savefs disk01 or Openfs disk01
These commands same the memory “image” (contents) to an external file,
in this example, it is called disk01, but can be called anything, the openfs
command retrieves the image/contents from the file and puts into memory.
List – list files (and other meta-information) in a FS
List what is in “your” directory
Remove name –remove named file from fs
Delete a user file, should reclaim the directory entry and file sectors
Rename oldname newname – rename a file stored in the FS
Just change user file name
Put ExternalFile – put (store) Host OS file into the disk
Get ExternalFile – get disk file, copy from “disk” to host OS file system
These operations put and get a user file from “outside” to and from your file system
User name – show/change name of user who owns this file
Link/Unlink – Unix style file linking
These are some more, common, meta operations, only changes something in directory,
not the data file contents
Bonus: Set/Use file permissions for r/w/x, implement subdirectories, “check disk”
Implement in either the “Go” or “Rust” programming language (20 to 75 point bonus)
Implementation:
(Note: these names and acronyms are hints, there are other methods and data structures
that may also work.)
Your FS should have 4 (or more, if easier to implement) sections:
A FileNameTable (FNT), a directory and a disk attribute/block pointer table (DABPT),
and the data blocks.
The FNT should be of size allocated, each entry should contain a 50 char
(maximum) file name and an inode pointer (index to DABPT)(blocks).
The DABPT should be allocated from disk blocks, 4 entries per block, where each entry
should contain a file meta-information (FileSize, last time+date (secs), pointers to
data blocks), user name
The Block Pointer Table has direct pointers to data blocks, and one additional
pointer to another entry in the Table, if needed (for big files), these may be
chained for very large files. (Similar to CP/M extents)
Since disks (and some meta-information) are fixed size, many small or one
large file might not fit on the “disk”. File names, file attributes and other file
information stored in FS are restrictive (for example, file creation time).
code:
#define FILE_SIZE 56
#define SIZE_OF_BLOCK 256
#define MAX_LINK 10
#define TIME_LENGTH 100
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <string.h>
typedef struct Table_function
{
//name of the file being stored
char fileName[FILE_SIZE];
// pointer to iNode
Table_bp iNode;
} Table_function;
typedef struct Table_bp
{
//The size of the file
int size_file;
//The variable for the index to dataBlock
int dataBlockPointer;
//for checking when last modified
char DTmodified[TIME_LENGTH];
} Table_bp;
struct Link
{
char linkName[FILE_SIZE];
Table_bp *linkedFile;
} Link;
struct FileSystem
{
//File system name
char name_of_fs[FILE_SIZE];
//Maps data slot for ABPT
int *ABPTMap;
//variable for the total number of Data blocks in System
int number_of_blocks;
//Maps data slot for FNT
int *FNTMap;
//variable to keep track of available FNT blocks
int FNT;
//Keep track of available ABPT blocks
int ABPT;
//Maps data slot for datablock CPM style
int *dataBlockMap;
//Structure for holding initial files
struct Table_bp *pointer_table;
struct Table_function *files;
char **dataBlocks;
struct Link link[MAX_LINK];
} FileSystem;
struct FileSystem FS;
void formatFS(char name[FILE_SIZE],int pointer_entries_num,int FNT)
{
printf(" File System created with \n name:%s\n no. of pointers: %d\n no of files:%d \n",name,pointer_entries_num,FNT);
// number of pointer entries
FS.ABPT=pointer_entries_num;
//file name system storing
strcpy(FS.name_of_fs,name);
// number of files
FS.FNT=FNT;
//initialization
FS.files=malloc(FNT*sizeof(struct Table_function));
FS.pointer_table=malloc(pointer_entries_num*sizeof(struct Table_bp));
FS.FNTMap= malloc(FNT*sizeof(int*));
FS.ABPTMap= malloc(pointer_entries_num*sizeof(int*));
}
void createFS(int number_of_blocks)
{
int j;
char **d_holder;
int i;
printf("Selected Datablocks: %d\n",number_of_blocks);
FS.number_of_blocks=number_of_blocks;
d_holder=(char**) malloc(SIZE_OF_BLOCK*sizeof(char*));
for(i=0;i<number_of_blocks;i++)
{
d_holder[i]=(char*)malloc(number_of_blocks*sizeof(char));
}
FS.dataBlockMap= malloc(number_of_blocks*sizeof(int*));
FS.dataBlocks=d_holder;
}
//main function
void execute()
{
char name_of_fs[FILE_SIZE];
int choice=-1;
char trasher[FILE_SIZE];
char deleter[FILE_SIZE];
while(1)
{
printf("1) Exit\n");
printf("2) Create FileSystem\n");
printf("3) Format FileSystem\n");
printf("4) Delete a file\n");
printf("5) Trash a file\n");
printf("Choice?: ");
scanf("%d",&choice);
printf("\n");
switch(choice)
{
case 1: // exit if not selected 1 or 2
exit(0);
// creating the file system
case 2:
printf("Enter the number of data blocks in the system: \n");
int block_num;
scanf("%d",&block_num);
// the below call will create the file system with user specified number of blocks
createFS(block_num);
// success message od disk created successfully
printf("***Disk Created***\n\n");
break;
case 3: // formatting the file system
printf("*** formatting of File system in progress...\n");
printf("File System Name: \n");
// file system to be formatted
scanf("%s",name_of_fs);
printf("Block Pointers Number?: \n");
int numBlockPointers;
int numFiles;
scanf("%d",&numBlockPointers);
printf("Number of files?: \n");
scanf("%d",&numFiles);
// format the file system with the specified parameters
formatFS(name_of_fs,numBlockPointers,numFiles);
printf("***Disk Formatted***\n\n"); // formatted successfully
break;
case 4:
printf("File name?");
scanf("%s",deleter);
printf("%s File deleted\n\n\n",deleter);
break;
case 5:
printf("File name?");
scanf("%s",trasher);
printf("%s File Trashed\n\n\n",trasher);
break;
}
}
}
int main()
{
execute();
return 0;
}
error:
main.c:18:5: error: unknown type name ‘Table_bp’
18 | Table_bp iNode
You are using the name Table_bp in this structure definition
typedef struct Table_function
{
//name of the file being stored
char fileName[FILE_SIZE];
// pointer to iNode
Table_bp iNode;
} Table_function;
though the name Table_bp is not declared yet.
Place this typedef declaration with the structure Table_bp definition
typedef struct Table_bp
{
//The size of the file
int size_file;
//The variable for the index to dataBlock
int dataBlockPointer;
//for checking when last modified
char DTmodified[TIME_LENGTH];
} Table_bp;
above the definition of the structure struct Table_function.

Is it possible to embed binary data into DOS EXEs made in Turbo C?

I've set up a DOSBox development environment with Turbo C++ intending to make a game with my friends.
I'm using C code, and am wondering how I'd link binary data into the EXE. (All my previous experience with C is libGBA, sorry if that's not actually possible in the way I think it'd be.)
If it isn't possible, then what would be an alternative option for embedding binary data? (I don't really want to need a bunch of binary files in the game directory...)
Can't find much third party documentation for Turbo C, especially considering I'm using the other, supported, but not main language for my IDE which was last updated in the early 2000s after moving to another OS entirely.
An easy solution used by programs such as self-extracting .zip files, is to simply append the data onto the end of the .exe file. The size of the .exe can be calculated from values in the header, which will give you the offset where the appended data begins. Here is an example C program that compiles with Borland Turbo C v2.01 (available as freeware) - note that I have omitted error checking for clarity:
#include <stdio.h>
int main(int argc, const char *argv[])
{
FILE *fSelf;
unsigned short lenFinalBlock, numBlocks;
unsigned long offEnd;
char trailing[256];
int len;
/* Open our own .exe file */
fSelf = fopen(argv[0], "rb");
/* Read part of the .exe header */
fseek(fSelf, 2, SEEK_SET);
fread(&lenFinalBlock, 2, 1, fSelf);
fread(&numBlocks, 2, 1, fSelf);
/* Calculate the size of the .exe from the header values */
offEnd = numBlocks * 512;
if (lenFinalBlock) offEnd -= 512 - lenFinalBlock;
/* Jump to the end of the .exe and read what's there */
fseek(fSelf, offEnd, SEEK_SET);
/* Just read it as a string - you'd presumably be using
some custom data format here instead */
len = fread(trailing, 1, 256, fSelf);
trailing[len] = 0;
printf("Trailing data (%d bytes # 0x%lX):\n%s", len, offEnd, trailing);
fclose(fSelf);
return 0;
}
Once compiled to trailing.exe you can use it like this:
C:\>trailing
Trailing data (0 bytes # 0x2528):
I'm on Linux so I will append some example data using the shell:
$ echo Hello >> trailing.exe
Running it again shows it picking up the trailing data:
C:\>trailing
Trailing data (6 bytes # 0x2528):
Hello
It should be possible to use the BGIOBJ.EXE utility which is included with Turbo C++ to achieve what you want.
BGIOBJ can convert a binary file to an .obj file which then can be linked into the .exe file. Its primary purpose is to include BGI drivers and fonts in the .exe, but it shouldn't put any restrictions on the file (except for size).
Unfortunately I can't tell you exactly how to get the memory address where the file is loaded at run-time, but that shouldn't be too difficult. BGIOBJ supports parameters for public name, segment name and segment class so you can refer to these.

Fast Creation of a very large File in Debian Linux

I am currently working on a project that involves transferring a very large file (about 6GB) from one Linux Server to another. The servers run on Debian Squeeze.
In order to achieve my main goal, I initially send the file's name and size to the destination machine, and I create an empty file for storing the data blocks that I progressively receive from the source machine.
My problem is that the creation of a 6GB file takes too long in my server. To make it more clear I use the following C routine in order to create the new file:
void create_file(char* f_name, long long f_size) {
char* bs, *of, *s_f_size, *count;
if((pid = fork()) < 0) {
perror("fork() failed.");
return;
}
if(pid == 0) {
//Call execl
of = (char*) malloc(sizeof(char)*(strlen("of=") + strlen(f_name) + 1));
s_f_size = (char*) malloc(sizeof(char)*32);
sprintf(s_f_size, "%lld", file_size);
count = (char*) malloc(sizeof(char)*(strlen("count=") + strlen(s_f_size) + 1));
strcpy(of, "of=");
strcat(of, f_name);
strcpy(count, "count=");
strcat(count, s_f_size);
ret = execl("/bin/dd", "dd", "if=/dev/zero", of, "bs=1", count, (char*) 0);
if(ret < 0) {
perror("execl() failed");
free(s_f_size);
free(of);
free(count);
return;
}else {
free(s_f_size);
free(of);
free(count);
return;
}
}else {
status = 0;
wpid = wait(&status);
}
}
I used the Linux dd command because I thought that it would be the quickest way to create an empty 6GB file. However, it takes about 15 minutes to complete. Is there a way to create the empty file faster? What am I doing wrong?
Thank you for your time.
Sincerely,
Nick
In addition to what Joachim Pileborg suggested, you can also use posix_fallocate() to pre-allocate space for your file.
First creat the file, then lseek to the wanted end, and write a dummy byte. Very quick way to create an arbitrary large but sparse file.
If you don't want the file to be sparse, then find out the block size of the drive (can be found out using stat on most POSIX platforms). Create a buffer of that size, and write it to the file until the wanted size.
If the stat structure doesn't have the st_blksize member, then most filesystems have a blocksize of 4 or 8 kB. You can probably make this buffer larger, but not too large. Experiment and benchmark!
If you're using kernel v2.6.31+ and if filesystem supports it, consider using fallocate:
fallocate -l 6GB hugefile
It preallocates blocks to a file.
Creating large files takes long because there's a lot on the file system the OS has to do. Only in case of sparse files this can be skipped (see Joachim Pileborg's answer for that). A sparse file is a file containing "holes" (large chunks of zero bytes). Such a file does not use as much space as it is large. Creating such a file beforehand will produce the file with the correct size very fast.
In case you want to reserve the disk space to avoid running out of it before the transmission is complete, a sparse file won't do. You will have to write at least one byte into each block then to avoid the holes of a sparse file. I'm not sure this will be faster than simply dump zeros into the file until it has the desired size, as you already do.
i remember, i've used open system call to create a empty file. Then dump data to file.
In case of partial data write, keep seek the position and dump from there. If file exists use the file to overwrite the data into it.
With respect to performance this approach was quite good.

C: theory on how to extract files from an archived file

In C I have created a program which can archive multiple files into an archive file via the command line.
e.g.
$echo 'file1/2' > file1/2.txt
$./archive file1.txt file2.txt arhivedfile
$cat archivedfile
file1
file2
How do I create a process so that in my archivedfile I have:
header
file1
end
header
file2
end
They are all stored in the archive file one after another after another. I know that perhaps a header file is needed(containing filename, size of filename, start and end of file) for extracting these files back out into their original form, but how would I go about doing this.
I am stuck on where and how to start.
Please could someone help me on some logic as to how to approach extracting files back out of an archived file.
As has been mentioned before, start with the algorithm. You already have most of the details.
There are a few approaches you can take:
Random access archive.
Sequential access archive.
Random Access Archive
For this to work, the header needs to act as an index (like the card indexes at a library), indicating; (a) where to find the start of each file; and (b) the length of each file. The algorithm to write the archive file might look like:
Get a list of all the files from the command line.
Create a structure to hold the meta data about each file: name (255 char), size (64-bit int), date and time, and permissions.
For each file, get its stats.
Store the stats of each file within an array of structures.
Open the archive for writing.
Write the header structure.
For each file, append its content to the archive file.
Close the archive file.
(The header might have to include the number of files, too.)
Next, the algorithm for extracting files:
Get an archive file from the command line.
Get a file name to extract, also from the command line.
Create memory for a structure to read meta data about each file.
Read all the meta data from the archive file.
Search for the file name to extract throughout list of the meta data.
Calculate the offset into the archive file for the start of the matching file name.
Seek to the offset.
Read the file content and write it out to a new file.
Close the new file.
Close the archive.
Sequential Access
This is easier. You can do it yourself: think through the steps.
About Programming
It is easy to get caught up in the details of how something should work. I suggest that you take a step back -- something your teacher should discuss in class -- and try to think about the problem at a level above coding, because:
the algorithm you create will be language independent;
fixing mistakes in an algorithm, before code is written, is trivial;
you will have a better understanding of what you need to do before coding;
it will take less time to implement the solution;
you can identify areas that can be implemented in parallel;
you will see any potential roadblocks ahead of time; and
you will be on your way to management positions in no time. ;-)
I would think that the header would need to have information needed to identify the file and how big it is within the archive - for example, file name, original directory, and size in either lines or bytes, depending on which is more useful in your context. You'd then need routines to create a header, add a file to an archive (create a header and append the file data), extract a file from an archive (follow the headers until the correct entry is found and copy the data from the archive to a separate file), and delete a file (start reading the archive, copying data for all entries except the one you want to delete to a new file, then delete the old archive and rename the new one to the old name).
Share and enjoy.
One approach is to imitate the ZIP format: http://en.wikipedia.org/wiki/ZIP_file_format
It uses a directory structure at the end of the file, which contains pointers to the offsets of the files in the archive. The big benefit of this structure is that you can find a given file without having to read the entire archive -- as long as you know the start of the directory and have the ability to randomly access the file.
An alternative is the TAR file format: http://en.wikipedia.org/wiki/Tar_file_format
This is designed for streaming media ("tape archive"), so each entry contains its own metadata. You have to scan the entire file for an entry, but the normal use case is to pack/unpack entire directory trees, so this isn't too bad a penalty.
Doing it in a streaming fashion, like tar, is probably the easiest implementation. First, write a magic number out so you can identify that this is your archive format. I'd then suggest using stat(2) (that's man syntax for the stat man page, section 2) to get the size of the file to be archived. Actually, look closely at the stat fields available to you, there may be some interesting information there you'd want to keep.
Write out the information you need in a tag=value fashion, one per line. For example:
FileName=file1.txt
FileSize=10
FileDir=./blah/blah
FilePerms=0700
End your header with two newlines so you know when to start pushing out FileSize bytes to disk. You don't need an beginning of header marker, because you know the filesize to write out, so you know when to start parsing your header again.
I'm suggesting you use a text format for your header information because then you don't have to worry about byte ordering, etc. that you'd need to worry about if you write a raw binary struct out to disk.
When reading your archive, parse the header lines one by one and populate a local struct to hold that information. Then write out the file to disk, and set any file properties that need updating based on the header info you extracted.
Hope that helps. Good luck.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
int size;
char name[20];
}Header;
void packfiles(char* archive_file, int numfiles, char** filenames){
FILE* fp = fopen(archive_file, "wb");
if(!fp){
perror("Error opening archive file");
exit(1);
}
for(int i = 0; i < numfiles; i++){
FILE* infile = fopen(filenames[i], "rb");
if(!infile){
perror("Error opening input file");
exit(1);
}
//Get file size
fseek(infile, 0, SEEK_END);
int fsize = ftell(infile);
rewind(infile);
//Create header
Header header;
header.size = fsize;
strncpy(header.name, filenames[i], 20);
//Write header and file content to archive
fwrite(&header, sizeof(Header), 1, fp);
for(int j = 0; j < fsize; j++){
fputc(fgetc(infile), fp);
}
//Add padding if necessary
if(fsize % 4 != 0){
for(int j = 0; j < 4-(fsize % 4); j++){
fputc(0, fp);
}
}
fclose(infile);
}
fclose(fp);
}
void unpackfiles(char* archive_file){
FILE* fp = fopen(archive_file, "rb");
if(!fp){
perror("Error opening archive file");
exit(1);
}
while(1){
//Read header
Header header;
int read = fread(&header, sizeof(Header), 1, fp);
if(read == 0){
//EOF
break;
}
else if(read != 1){
perror("Error reading header");
exit(1);
}
//Create output file
FILE* outfile = fopen(header.name, "wb");
if(!outfile){
perror("Error creating output file");
exit(1);
}
//Write file content to output file
for(int i = 0; i < header.size; i++){
fputc(fgetc(fp), outfile);
}
//Skip padding
fseek(fp, 4-(header.size % 4), SEEK_CUR);
fclose(outfile);
}
fclose(fp);
}
int main(int argc, char** argv){
if(argc < 3){
fprintf(stderr, "Usage: %s <archive_file> <file1> [<file2>...]\n", argv[0]);
exit(1);
}
packfiles(argv[1], argc-2, argv+2);
unpackfiles(argv[1]);
return 0;
}

Retrieve filename from file descriptor in C

Is it possible to get the filename of a file descriptor (Linux) in C?
You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.
I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.
In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.
As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).
Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.
You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?
There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.
Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.

Resources