Recursive file delete in C on Linux - c

I have a C program that, at one point in the program has this:
system("rm -rf foo");
Where foo is a directory. I decided that, rather than calling system, it would be better to do the recursive delete right in the code. I assumed a piece of code to do this would be easy to find. Silly me. Anyway, I ended up writing this:
#include <stdio.h>
#include <sys/stat.h>
#include <dirent.h>
#include <libgen.h>
int recursiveDelete(char* dirname) {
DIR *dp;
struct dirent *ep;
char abs_filename[FILENAME_MAX];
dp = opendir (dirname);
if (dp != NULL)
{
while (ep = readdir (dp)) {
struct stat stFileInfo;
snprintf(abs_filename, FILENAME_MAX, "%s/%s", dirname, ep->d_name);
if (lstat(abs_filename, &stFileInfo) < 0)
perror ( abs_filename );
if(S_ISDIR(stFileInfo.st_mode)) {
if(strcmp(ep->d_name, ".") &&
strcmp(ep->d_name, "..")) {
printf("%s directory\n",abs_filename);
recursiveDelete(abs_filename);
}
} else {
printf("%s file\n",abs_filename);
remove(abs_filename);
}
}
(void) closedir (dp);
}
else
perror ("Couldn't open the directory");
remove(dirname);
return 0;
}
This seems to work, but I'm too scared to actually use it in production. I'm sure I've done something wrong. Does anyone know of a C library to do recursive delete I've missed, or can someone point out any mistakes I've made?
Thanks.

POSIX has a function called ftw(3) (file tree walk) that
walks through the directory tree that is located under the directory dirpath, and calls fn() once for each entry in the tree.

kudos for being scared to death, that's a healthy attitude to have in a case like this.
I have no library to suggest in which case you have two options:
1) 'run' this code exhaustively
a) not on a machine; on paper, with pencil. take an existing directory tree, list all the elements and run the program through each step, verify that it works
b) compile the code but replace all of the deletion calls with a line that does a printf - verify that it does what it should do
c) re-insert the deletion calls and run
2) use your original method (call system())

I would suggest one additional precaution that you can take.
Almost always when you delete multiple files and/or directories it would be a good idea to chroot() into the dir before executing anything that can destroy your data outside this directory.

I think you will need to call closedir() before recursiveDelete() (because you don't want/need all the directories open as you step into them. Also closedir() before calling remove() because remove() will probably give an error on the open directory. You should step through this once carefully to make sure that readdir() does not pickup the '..'. Also be wary of linked directories, you probably wouldn't want to recurse into directories that are
symbolic or hard links.

Related

using threading and mutex locks to search directories

I am new to threading and I believe I understand the concept. As locks are a necessary tool to use threading but are (or at least to me) confusing on how to use I need to use them but cannot seem to get them correct. The idea here is to search through directories to find CSV files. (more work will be done on CSVs but that is not relevant here) I have an algorithm to search through directories that works fine without the use of threading. (keep in mind that searching through directories is the kind of task that is perfect for recursion because you need to search through a directory to find another directory and when you find the new directory you want to search that directory) Since I need to use threading on each instance of finding new directory I have the same algorithm set up twice. Once in main where it finds directories and the calls a function (through threading) to search the found directories. Again, if I use this method without threading I have zero problems but with threading the arguments I send in to the function are overwritten. This happens even if I lock the entire function. Clearly I am not using locks and threading correctly but where I'm going wrong eludes me. I have test directories to verify that it is (or is not) working. I have 3 directories in the "." directory and then sub directories beyond that. It finds the first three directories (in main) fine then when it passes those into the threaded function it will search three different times but usually with searching the same directory more than once. In other words the path name seems to be overwritten. I'll post code so you can see what I'm doing. I thank you in advance. Links to complete code:sorter.h https://pastebin.com/0vQZbrmh sorter.c https://pastebin.com/9wd8aa74 dirWorker.c https://pastebin.com/Jd4i1ecr
In sorter.h
#define MAXTHREAD 255
extern pthread_mutex_t lock;
typedef
struct _dir_proc
{
char* path; //the path to the new found directory
char* colName; //related to the other work that must be done
} dir_proc;
In sorter.c
#include <pthread.h>
#include <assert.h>
#include <dirent.h>
#include "sorter.h"
pthread_mutex_t lock;
int main(int argc, char* argv[])
{
int err = 0;
pthread_t threads[MAXTHREAD];
DIR *dirPointer;
char* searchedDirectory = ".";
struct dirent *directEntry;
dir_proc *dir_proc_args = malloc(sizeof(struct _dir_proc));
assert(dir_proc_args != NULL);
dir_proc_args->path = (char*) malloc(256 * (sizeof(char));
assert(dir_proc_args->path != NULL);
dir_proc_args->colName = (char*) malloc(256 * sizeof(char));
assert(dir_proc_args->colName != NULL);
pthread_mutex_init(&lock, NULL)
//dir_proc_args->colName is saved here
if(!(dirPointer = opendir(searchedDirectory)))
{
fprintf(stderr, "opening of directory has failed");
exit(1);
}
while((directEntry = readdir(dirPointer)) != NULL)
{
//do stuff here to ensure it is a directory
//ensure that the dir we are looking at is not current or parent dir
//copy path of found directory to dir_proc_args->path
err = pthread_create(&threads[count++], NULL, &CSVFinder, (void*)dir_proc_args);
if(err != 0)
printf("can't create thread);
}
int i;
for(i=0; i < count; ++i)
{
pthread_join(threads[i], NULL);
}
pthread_mutex_destroy(&lock);
}
in CSVFinder function
#include <assert.h>
#include <pthread.h>
#include "sorter.h"
#include <dirent.h>
void *CSVFinder(void *args)
{
pthread_mutex_lock(&lock); //I have locked the entire function to see I can get it to work. this makes no sense to actually do
DIR *dirPointer;
struct dirent *directEntry;
dir_proc *funcArgs = (struct _dir_proc*)args;
char path[255];
strncpy(path, funcArgs->path, sizeof(path));
if(!(dirPointer = opendir(funcArgs->path)))
{
fprintf(stderr, "opening of directory has failed");
exit(1);
}
while((directEntry = readdir(dirPointer)) != NULL)
{
if(directEntry->d_type == DT_DIR) //if we are looking at a directory
{
//make sure the dir we are looking at is not current or parent dir
snprintf(funcArgs->path, (sizeof(path) + sizeof(directEntry->d_name)), "%s/%s", path, directEntry->d_name);
//I would like to be able to do a recursive call here
//to search for more directories but one thing at a time
}
}
closedir(dirPointer);
pthread_mutex_unlock(&lock);
return(NULL);
}
I hope I have not left out any relevant code. I tried to keep the code to a minimum while not leaving anything necessary out.
It's not clear to me why you want to create a thread to simply traverse a directory structure. However, I will point out a few issues I see.
One minor issue is you in the CSVFinder function, you call readder, not readdir.
But one glaring issue to me is that you do not initialize dirPointer in main or in the CSVFinder() function. I would expect to see a call like
dirPointer = opendir("/");
in the main() function before the while loop.
Then I would expect to see CSVFinder() initialize its dirPointer with a call to opendir(path) where path is a name to a subdirectory found in the main loop.
For a good reference to how to traverse a directory structure go here...
https://www.lemoda.net/c/recursive-directory/

How to determine files and directories in parent/other directories

I found the answer to another question here to be very helpful.
There seems to be a limitation of the sys/stat.h library as when I tried to look in other directories everything was seen as a directory.
I was wondering if anyone knew of another system function or why it sees anything outside the current working directory as only a directory.
I appreciate any help anyone has to offer as this is perplexing me and various searches have turned up no help.
The code I made to test this is:
#include <sys/stat.h>
#include <dirent.h>
#include <stdio.h>
int main(void) {
int status;
struct stat st_buf;
struct dirent *dirInfo;
DIR *selDir;
selDir = opendir("../");
// ^ or wherever you want to look
while ((dirInfo = readdir(selDir))) {
status = stat (dirInfo->d_name, &st_buf);
if (S_ISREG (st_buf.st_mode)) {
printf ("%s is a regular file.\n", dirInfo->d_name);
}
if (S_ISDIR (st_buf.st_mode)) {
printf ("%s is a directory.\n", dirInfo->d_name);
}
}
return 0;
}
You need to check the status of the stat call; it is failing.
The trouble is that you're looking for a file the_file in the current directory when it is actually only found in ../the_file. The readdir() function gives you the name relative to the other directory, but stat() works w.r.t the current directory.
To make it work, you'd have to do the equivalent of:
char fullname[1024];
snprintf(fullname, sizeof(fullname), "%s/%s", "..", dirInfo->d_name);
if (stat(fullname, &st_buf) == 0)
...report on success...
else
...report on failure...
If you printed out stat, you'll notice there's an error (File not found).
This is because stat takes the path to the file, but you're just providing the file name.
You then call IS_REG on garbage values.
So, suppose you have a file ../test.txt
You call stat on test.txt...That isn't in directory ./test.txt, but you still print out the results from IS_REG.

Reading multiple text files in C

What is the correct way to read and extract data from text files when you know that there will be many in a directory? I know that you can use fopen() to get the pointer to the file, and then do something like while(fgets(..) != null){} to read from the entire file, but then how could I read from another file? I want to loop through every file in the directory.
Sam, you can use opendir/readdir as in the following little function.
#include <stdio.h>
#include <dirent.h>
static void scan_dir(const char *dir)
{
struct dirent * entry;
DIR *d = opendir( dir );
if (d == 0) {
perror("opendir");
return;
}
while ((entry = readdir(d)) != 0) {
printf("%s\n", entry->d_name);
//read your file here
}
closedir(d);
}
int main(int argc, char ** argv)
{
scan_dir(argv[1]);
return 0;
}
This just opens a directory named on the command line and prints the names of all files it contains. But instead of printing the names, you can process the files as you like...
Typically a list of files is provided to your program on the command line, and thus are available in the array of pointers passed as the second parameter to main(). i.e. the invoking shell is used to find all the files in the directory, and then your program just iterates through argv[] to open and process (and close) each one.
See p. 162 in "The C Programming Language", Kernighan and Ritchie, 2nd edition, for an almost complete template for the code you could use. Substitute your own processing for the filecopy() function in that example.
If you really need to read a directory (or directories) directly from your program, then you'll want to read up on the opendir(3) and related functions in libc. Some systems also offer a library function called ftw(3) or fts(3) that can be quite handy too.

listing the files in a directory and delete them in C/C++

In a "C" code I would like to list all the files in a directory and delete the oldest one. How do I do that?
Can I use popen for that or we have any other solutions??
Thanks,
From the tag, I assume that you want to do this in a POSIX compliant system. In this case a code snippet for listing files in a folder would look like this:
#include <dirent.h>
#include <sys/types.h>
#include <stdio.h>
DIR* dp;
struct dirent* ep;
char* path = "/home/mydir";
dp = opendir(path);
if (dp != NULL)
{
printf("Dir content:\n");
while(ep = readdir(dp))
{
printf("%s\n", ep->d_name);
}
}
closedir(dp);
To check file creation or modification time, use stat (man 2 stat). For removing file, just use function remove(const char* path)
On Linux (and indeed, any POSIX system), you read a directory by calling opendir() / readdir() / closedir(). You can then call stat() on each directory entry to determine if it's a file, and what its access / modification / status-change times are.
If your definition of "oldest" depends on the creation time of the file, then you're on shaky ground - traditionally UNIX didn't record the creation time. On Linux, some recent filesystems do provide it through the extended attribute file.crtime (which you access using getxattr() from sys/xattr.h), but you'll have to handle the common case where that attribute doesn't exist.
You can scan the directory using readdir and opendir
or, if you want to traverse (recursively) a file hierarchy fts or nftw. Don't forget to ignore the entries for the current directory "." and the parent ".." one. You probably want to use the stat syscall too.

Delete files while reading directory with readdir()

My code is something like this:
DIR* pDir = opendir("/path/to/my/dir");
struct dirent pFile = NULL;
while ((pFile = readdir())) {
// Check if it is a .zip file
if (subrstr(pFile->d_name,".zip") {
// It is a .zip file, delete it, and the matching log file
char zipname[200];
snprintf(zipname, sizeof(zipname), "/path/to/my/dir/%s", pFile->d_name);
unlink(zipname);
char* logname = subsstr(zipname, 0, strlen(pFile->d_name)-4); // Strip of .zip
logname = appendstring(&logname, ".log"); // Append .log
unlink(logname);
}
closedir(pDir);
(this code is untested and purely an example)
The point is: Is it allowed to delete a file in a directory while looping through the directory with readdir()?
Or will readdir() still find the deleted .log file?
Quote from POSIX readdir:
If a file is removed from or added to
the directory after the most recent
call to opendir() or rewinddir(),
whether a subsequent call to readdir()
returns an entry for that file is
unspecified.
So, my guess is ... it depends.
It depends on the OS, on the time of day, on the relative order of the files added/deleted, ...
And, as a further point, between the time the readdir() function returns and you try to unlink() the file, some other process could have deleted that file and your unlink() fails.
Edit
I tested with this program:
#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(void) {
struct dirent *de;
DIR *dd;
/* create files `one.zip` and `one.log` before entering the readdir() loop */
printf("creating `one.log` and `one.zip`\n");
system("touch one.log"); /* assume it worked */
system("touch one.zip"); /* assume it worked */
dd = opendir("."); /* assume it worked */
while ((de = readdir(dd)) != NULL) {
printf("found %s\n", de->d_name);
if (strstr(de->d_name, ".zip")) {
char logname[1200];
size_t i;
if (*de->d_name == 'o') {
/* create `two.zip` and `two.log` when the program finds `one.zip` */
printf("creating `two.zip` and `two.log`\n");
system("touch two.zip"); /* assume it worked */
system("touch two.log"); /* assume it worked */
}
printf("unlinking %s\n", de->d_name);
if (unlink(de->d_name)) perror("unlink");
strcpy(logname, de->d_name);
i = strlen(logname);
logname[i-3] = 'l';
logname[i-2] = 'o';
logname[i-1] = 'g';
printf("unlinking %s\n", logname);
if (unlink(logname)) perror("unlink");
}
}
closedir(dd); /* assume it worked */
return 0;
}
On my computer, readdir() finds deleted files and does not find files created between opendir() and readdir(). But it may be different on another computer; it may be different on my computer if I compile with different options; it may be different if I upgrade the kernel; ...
I'm testing my new Linux reference book. The Linux Programming Interface by Michael Kerrisk and it says the following:
SUSv3 explicitly notes that it is unspecified whether readdir() will return a filename that has been added to or removed from since the last since the last call to opendir() or rewinddir(). All filenames that have been neither added nor removed since the last such call are guaranteed to be returned.
I think that what is unspecified is what happens to dirents not yet scanned. Once an entry has been returned, it is 100% guaranteed that it will not be returned anymore whether or not you unlink the current dirent.
Also note the guarantee provided by the second sentence. Since you are leaving alone the other files and only unlinking the current entry for the zip file, SUSv3 guarantees that all the other files will be returned. What happens to the log file is undefined. it may or may not be returned by readdir() but in your case, it shouldn't be harmful.
The reason why I have explored the question it is to find an efficient way to close file descriptors in a child process before exec().
The suggested way in APUE from Stevens is to do the following:
int max_open = sysconf(_SC_OPEN_MAX);
for (int i = 0; i < max_open; ++i)
close(i);
but I am thinking using code similar to what is found in the OP to scan /dev/fd/ directory to know exactly which fds I need to close. (Special note to myself, skip over dirfd contained in the DIR handle.)
I found the following page describe the solution of this problem.
https://support.apple.com/kb/TA21420

Resources