I must note that developing code for windows platform is a mystery for me at the moment.
I have an issue with going through a directory and reading up files, where polish characters are present. Following code snippets work on windows in english and polish version, but not in german. What's the reason? I've tried with POSIX opendir() / readdir() and with winapi FindFirstFile() / FindNextFile() functions. They gave same results.
Consider following program. It reads up files ( in POSIX and winapi way ) within test directory and prints them out.
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <dirent.h>
int main(void)
{
char* name;
name = _getcwd(NULL, 0);
char* tmp = realloc(name, strlen(name)*2);
if (tmp != NULL) {
name = tmp;
}
else {
perror("realloc");
free(name);
exit(1);
}
strcat(name, "\\test");
DIR* dir = opendir(name);
struct dirent* s;
while ((s = readdir(dir)) != NULL) {
printf("%s\n", s->d_name);
}
closedir(dir);
free(s);
strcat(name, "\\*");
HANDLE dir_w;
WIN32_FIND_DATA s_w;
if ((dir_w = FindFirstFile(name, &s_w)) != INVALID_HANDLE_VALUE) {
while (FindNextFile(dir_w, &s_w) != 0) {
printf("%s\n", s_w.cFileName);
}
}
FindClose(dir_w);
free(name);
return 0;
}
ISSUE : on windows with german language polish characters are interpreted as ASCII characters. If, for example, file has name:
ąęćś.txt
then on english language windows it is read up correctly, but on german one is read as
aecs.txt
but such file does not exist. Should I go with wchar_t type and functions specific to it (FindNextFilew) ? I don't have german windows so it is hard for me to reproduce the error. I've been trying to change code page with command chcp 1250 but that didn't help. Setting locale to pl also did not help.
Compiler: gcc 5.0.0
The C standard library functions you call will be using ANSI encoded text which as I'm sure you know cannot handle international text.
So you'll want to use the native Unicode Windows APIs rather than the legacy ANSI versions. That means wchar_t based functions with W prefixes like FindFirstFileW. For a start you should enable the Unicode conditionals so that the FindFirstFile macro expands to FindFirstFileW.
In a "C" code I would like to list all the files in a directory and delete the oldest one. How do I do that?
Can I use popen for that or we have any other solutions??
Thanks,
From the tag, I assume that you want to do this in a POSIX compliant system. In this case a code snippet for listing files in a folder would look like this:
#include <dirent.h>
#include <sys/types.h>
#include <stdio.h>
DIR* dp;
struct dirent* ep;
char* path = "/home/mydir";
dp = opendir(path);
if (dp != NULL)
{
printf("Dir content:\n");
while(ep = readdir(dp))
{
printf("%s\n", ep->d_name);
}
}
closedir(dp);
To check file creation or modification time, use stat (man 2 stat). For removing file, just use function remove(const char* path)
On Linux (and indeed, any POSIX system), you read a directory by calling opendir() / readdir() / closedir(). You can then call stat() on each directory entry to determine if it's a file, and what its access / modification / status-change times are.
If your definition of "oldest" depends on the creation time of the file, then you're on shaky ground - traditionally UNIX didn't record the creation time. On Linux, some recent filesystems do provide it through the extended attribute file.crtime (which you access using getxattr() from sys/xattr.h), but you'll have to handle the common case where that attribute doesn't exist.
You can scan the directory using readdir and opendir
or, if you want to traverse (recursively) a file hierarchy fts or nftw. Don't forget to ignore the entries for the current directory "." and the parent ".." one. You probably want to use the stat syscall too.
I have a C program that, at one point in the program has this:
system("rm -rf foo");
Where foo is a directory. I decided that, rather than calling system, it would be better to do the recursive delete right in the code. I assumed a piece of code to do this would be easy to find. Silly me. Anyway, I ended up writing this:
#include <stdio.h>
#include <sys/stat.h>
#include <dirent.h>
#include <libgen.h>
int recursiveDelete(char* dirname) {
DIR *dp;
struct dirent *ep;
char abs_filename[FILENAME_MAX];
dp = opendir (dirname);
if (dp != NULL)
{
while (ep = readdir (dp)) {
struct stat stFileInfo;
snprintf(abs_filename, FILENAME_MAX, "%s/%s", dirname, ep->d_name);
if (lstat(abs_filename, &stFileInfo) < 0)
perror ( abs_filename );
if(S_ISDIR(stFileInfo.st_mode)) {
if(strcmp(ep->d_name, ".") &&
strcmp(ep->d_name, "..")) {
printf("%s directory\n",abs_filename);
recursiveDelete(abs_filename);
}
} else {
printf("%s file\n",abs_filename);
remove(abs_filename);
}
}
(void) closedir (dp);
}
else
perror ("Couldn't open the directory");
remove(dirname);
return 0;
}
This seems to work, but I'm too scared to actually use it in production. I'm sure I've done something wrong. Does anyone know of a C library to do recursive delete I've missed, or can someone point out any mistakes I've made?
Thanks.
POSIX has a function called ftw(3) (file tree walk) that
walks through the directory tree that is located under the directory dirpath, and calls fn() once for each entry in the tree.
kudos for being scared to death, that's a healthy attitude to have in a case like this.
I have no library to suggest in which case you have two options:
1) 'run' this code exhaustively
a) not on a machine; on paper, with pencil. take an existing directory tree, list all the elements and run the program through each step, verify that it works
b) compile the code but replace all of the deletion calls with a line that does a printf - verify that it does what it should do
c) re-insert the deletion calls and run
2) use your original method (call system())
I would suggest one additional precaution that you can take.
Almost always when you delete multiple files and/or directories it would be a good idea to chroot() into the dir before executing anything that can destroy your data outside this directory.
I think you will need to call closedir() before recursiveDelete() (because you don't want/need all the directories open as you step into them. Also closedir() before calling remove() because remove() will probably give an error on the open directory. You should step through this once carefully to make sure that readdir() does not pickup the '..'. Also be wary of linked directories, you probably wouldn't want to recurse into directories that are
symbolic or hard links.
I'm trying to solve exercise from K&R; it's about reading directories. This task is system dependent because it uses system calls. In the book example authors say that their example is written for Version 7 and System V UNIX systems and that they used the directory information in the header < sys/dir.h>, which looks like this:
#ifndef DIRSIZ
#define DIRSIZ 14
#endif
struct direct { /* directory entry */
ino_t d_ino; /* inode number */
char d_name[DIRSIZ]; /* long name does not have '\0' */
};
On this system they use 'struct direct' combined with 'read' function to retrieve a directory entry, which consist of file name and inode number.
.....
struct direct dirbuf; /* local directory structure */
while(read(dp->fd, (char *) &dirbuf, sizeof(dirbuf)
== sizeof(dirbuf) {
.....
}
.....
I suppose this works fine on UNIX and Linux systems, but what I want to do is modify this so it works on Windows XP.
Is there some structure in Windows like 'struct direct' so I can use it
with 'read' function and if there is what is the header name where it is
defined?
Or maybe Windows requires completely different approach?
There's nothing like that in windows. If you want to enumerate a directory in Windows, you must use the FindFirstFile/FindNextFile API.
Yes, this works on Linux/Unix only. However if you are just playing around you may use Cygwin to build programs on Windows that use this Unix API.
It is interesting to note that you are using K&R 1st Edition, from 1978, and not the second edition. The second edition has different structures, etc, at that point in the book.
That code from the first edition does not work on many Unix-like systems any more. There are very few Unix machines left with file systems that restrict file names to the 14-character limit, and that code only works on those systems. Specifically, it does not work on MacOS X (10.6.2) or Solaris (10) or Linux (SuSE Linux Enterprise Edition 10, kernel 2.6.16.60-0.21-smp). With the test code shown below, the result is:
read failed: (21: Is a directory)
Current editions of POSIX explicitly permit the implementation to limit what you can do with a file descriptor opened on a directory. Basically, it can be used in the 'fchdir()' system calls and maybe a few relatives, but that is all.
To read the contents of the directory, you have to use the opendir() family of functions, and then readdir() etc. The second edition of K&R goes on to use these system calls instead of raw open() and read() etc.
All-in-all, it is not dreadfully surprising that code from over 30 years ago doesn't quite work the same as it used to.
On Windows, you can either use the POSIX sub-system or an emulation of that such as Cygwin or MingW, and in either case you will need to use the opendir() and readdir() family of function calls, not direct open() and read() on the directory file descriptor.
Or you can use the native Windows API that BillyONeal references, FindFirstFile and relatives.
Test code:
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
int main()
{
int fd = open(".", O_RDONLY);
if (fd != -1)
{
char buffer[256];
ssize_t n = read(fd, buffer, sizeof(buffer));
if (n < 0)
{
int errnum = errno;
printf("read failed: (%d: %s)\n", errnum, strerror(errnum));
}
else
printf("read OK: %d bytes (%s)\n", (int)n, buffer);
close(fd);
}
return(0);
}
boost::filesystem::directory_iterator provides a portable equivalent of Windows FindFirstFile/FindNextFile API and POSIX readdir_r() API. See this tutorial.
Note that this is C++ not plain C.
Is it possible to get the filename of a file descriptor (Linux) in C?
You can use readlink on /proc/self/fd/NNN where NNN is the file descriptor. This will give you the name of the file as it was when it was opened — however, if the file was moved or deleted since then, it may no longer be accurate (although Linux can track renames in some cases). To verify, stat the filename given and fstat the fd you have, and make sure st_dev and st_ino are the same.
Of course, not all file descriptors refer to files, and for those you'll see some odd text strings, such as pipe:[1538488]. Since all of the real filenames will be absolute paths, you can determine which these are easily enough. Further, as others have noted, files can have multiple hardlinks pointing to them - this will only report the one it was opened with. If you want to find all names for a given file, you'll just have to traverse the entire filesystem.
I had this problem on Mac OS X. We don't have a /proc virtual file system, so the accepted solution cannot work.
We do, instead, have a F_GETPATH command for fcntl:
F_GETPATH Get the path of the file descriptor Fildes. The argu-
ment must be a buffer of size MAXPATHLEN or greater.
So to get the file associated to a file descriptor, you can use this snippet:
#include <sys/syslimits.h>
#include <fcntl.h>
char filePath[PATH_MAX];
if (fcntl(fd, F_GETPATH, filePath) != -1)
{
// do something with the file path
}
Since I never remember where MAXPATHLEN is defined, I thought PATH_MAX from syslimits would be fine.
In Windows, with GetFileInformationByHandleEx, passing FileNameInfo, you can retrieve the file name.
As Tyler points out, there's no way to do what you require "directly and reliably", since a given FD may correspond to 0 filenames (in various cases) or > 1 (multiple "hard links" is how the latter situation is generally described). If you do still need the functionality with all the limitations (on speed AND on the possibility of getting 0, 2, ... results rather than 1), here's how you can do it: first, fstat the FD -- this tells you, in the resulting struct stat, what device the file lives on, how many hard links it has, whether it's a special file, etc. This may already answer your question -- e.g. if 0 hard links you will KNOW there is in fact no corresponding filename on disk.
If the stats give you hope, then you have to "walk the tree" of directories on the relevant device until you find all the hard links (or just the first one, if you don't need more than one and any one will do). For that purpose, you use readdir (and opendir &c of course) recursively opening subdirectories until you find in a struct dirent thus received the same inode number you had in the original struct stat (at which time if you want the whole path, rather than just the name, you'll need to walk the chain of directories backwards to reconstruct it).
If this general approach is acceptable, but you need more detailed C code, let us know, it won't be hard to write (though I'd rather not write it if it's useless, i.e. you cannot withstand the inevitably slow performance or the possibility of getting != 1 result for the purposes of your application;-).
Before writing this off as impossible I suggest you look at the source code of the lsof command.
There may be restrictions but lsof seems capable of determining the file descriptor and file name. This information exists in the /proc filesystem so it should be possible to get at from your program.
You can use fstat() to get the file's inode by struct stat. Then, using readdir() you can compare the inode you found with those that exist (struct dirent) in a directory (assuming that you know the directory, otherwise you'll have to search the whole filesystem) and find the corresponding file name.
Nasty?
There is no official API to do this on OpenBSD, though with some very convoluted workarounds, it is still possible with the following code, note you need to link with -lkvm and -lc. The code using FTS to traverse the filesystem is from this answer.
#include <string>
#include <vector>
#include <cstdio>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
#include <sys/sysctl.h>
#include <kvm.h>
using std::string;
using std::vector;
string pidfd2path(int pid, int fd) {
string path; char errbuf[_POSIX2_LINE_MAX];
static kvm_t *kd = nullptr; kinfo_file *kif = nullptr; int cntp = 0;
kd = kvm_openfiles(nullptr, nullptr, nullptr, KVM_NO_FILES, errbuf); if (!kd) return "";
if ((kif = kvm_getfiles(kd, KERN_FILE_BYPID, pid, sizeof(struct kinfo_file), &cntp))) {
for (int i = 0; i < cntp; i++) {
if (kif[i].fd_fd == fd) {
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link;
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (child->fts_statp->st_dev == kif[i].va_fsid) {
if (child->fts_statp->st_ino == kif[i].va_fileid) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
finish:
fts_close(file_system);
}
}
}
}
kvm_close(kd);
return path;
}
int main(int argc, char **argv) {
if (argc == 3) {
printf("%s\n", pidfd2path((int)strtoul(argv[1], nullptr, 10),
(int)strtoul(argv[2], nullptr, 10)).c_str());
} else {
printf("usage: \"%s\" <pid> <fd>\n", argv[0]);
}
return 0;
}
If the function fails to find the file, (for example, because it no longer exists), it will return an empty string. If the file was moved, in my experience when moving the file to the trash, the new location of the file is returned instead if that location wasn't already searched through by FTS. It'll be slower for filesystems that have more files.
The deeper the search goes in the directory tree of your entire filesystem without finding the file, the more likely you are to have a race condition, though still very unlikely due to how performant this is. I'm aware my OpenBSD solution is C++ and not C. Feel free to change it to C and most of the code logic will be the same. If I have time I'll try to rewrite this in C hopefully soon. Like macOS, this solution gets a hardlink at random (citation needed), for portability with Windows and other platforms which can only get one hard link. You could remove the break in the while loop and return a vector if you want don't care about being cross-platform and want to get all the hard links. DragonFly BSD and NetBSD have the same solution (the exact same code) as the macOS solution on the current question, which I verified manually. If a macOS user wishes to get a path from a file descriptor opened any process, by plugging in a process id, and not be limited to just the calling one, while also getting all hard links potentially, and not being limited to a random one, see this answer. It should be a lot more performant that traversing your entire filesystem, similar to how fast it is on Linux and other solutions that are more straight-forward and to-the-point. FreeBSD users can get what they are looking for in this question, because the OS-level bug mentioned in that question has since been resolved for newer OS versions.
Here's a more generic solution which can only retrieve the path of a file descriptor opened by the calling process, however it should work for most Unix-likes out-of-the-box, with all the same concerns as the former solution in regards to hard links and race conditions, although performs slightly faster due to less if-then, for-loops, etc:
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <fts.h>
using std::string;
using std::vector;
string fd2path(int fd) {
string path;
FTS *file_system = nullptr; FTSENT *child = nullptr; FTSENT *parent = nullptr;
vector<char *> root; char buffer[2]; strcpy(buffer, "/"); root.push_back(buffer);
file_system = fts_open(&root[0], FTS_COMFOLLOW | FTS_NOCHDIR, nullptr);
if (file_system) {
while ((parent = fts_read(file_system))) {
child = fts_children(file_system, 0);
while (child && child->fts_link) {
child = child->fts_link; struct stat info = { 0 };
if (!S_ISSOCK(child->fts_statp->st_mode)) {
if (!fstat(fd, &info) && !S_ISSOCK(info.st_mode)) {
if (child->fts_statp->st_dev == info.st_dev) {
if (child->fts_statp->st_ino == info.st_ino) {
path = child->fts_path + string(child->fts_name);
goto finish;
}
}
}
}
}
}
finish:
fts_close(file_system);
}
return path;
}
An even quicker solution which is also limited to the calling process, but should be somewhat more performant, you could wrap all your calls to fopen() and open() with a helper function which stores basically whatever C equivalent there is to an std::unordered_map, and pair up the file descriptor with the absolute path version of what is passed to your fopen()/open() wrappers (and the Windows-only equivalents which won't work on UWP like _wopen_s() and all that nonsense to support UTF-8), which can be done with realpath() on Unix-likes, or GetFullPathNameW() (*W for UTF-8 support) on Windows. realpath() will resolve symbolic links (which aren't near as commonly used on Windows), and realpath() / GetFullPathNameW() will convert your existing file you opened from a relative path, if it is one, to an absolute path. With the file descriptor and absolute path stored an a C equivalent to a std::unordered_map (which you likely will have to write yourself using malloc()'d and eventually free()'d int and c-string arrays), this will again, be faster than any other solution that does a dynamic search of your filesystem, but it has a different and unappealing limitation, which is it will not make note of files which were moved around on your filesystem, however at least you can check whether the file was deleted using your own code to test existence, it also won't make note of the file in whether it was replaced since the time you opened it and stored the path to the descriptor in memory, thus giving you outdated results potentially. Let me know if you would like to see a code example of this, though due to files changing location I do not recommend this solution.
Impossible. A file descriptor may have multiple names in the filesystem, or it may have no name at all.
Edit: Assuming you are talking about a plain old POSIX system, without any OS-specific APIs, since you didn't specify an OS.