Recursively find subdirectories and files - c

I want to retrieve all the files, directories and subdirectories contained within a given path, recursively. But I have a problem when my code reaches the second level (a directory within a directory): instead of opening the inner directory to search its contents, it throws an error. Here is what I have done:
void getFile(char *path)
{
DIR *dir;
struct dirent *ent;
if ((dir = opendir(path)) != NULL) {
/* print all the files and directories within directory */
while ((ent = readdir(dir)) != NULL) {
if((strcmp(ent->d_name,"..") != 0) && (strcmp(ent->d_name,".") != 0)){
printf ("%s", ent->d_name);
if(ent->d_type == DT_DIR){
printf("/\n");
getFile(ent->d_name);
}
else{
printf("\n");
}
} // end of if condition
} // end of while loop
closedir (dir);
}

Use the ftw(3) library function to recursively walk a file tree. It is quite standard.
You may also look into nftw and the MUSL libc source code for it. It is quite readable.

When you recursively call getFile you only call it with the only the name of the directory you just read. It's not the full path, which is what you need. You have to manage that yourself.
Something like this:
if(ent->d_type == DT_DIR)
{
if ((strlen(path) + strlen(ent->d_name) + 1) > PATH_MAX)
{
printf("Path to long\n");
return;
}
char fullpath[PATH_MAX + 1];
strcpy(fullpath, path);
strcat(fullpath, "/");
strcat(fullpath, ent->d_name); // corrected
getFile(fullpath);
}

Related

Readdir() in a sequential behavior

int indent = 0;
int listDir(const char* dirname){
DIR* dir;
struct dirent* d;
if(!(dir = opendir(dirname)))
return -1;
while((d = readdir(dir)) != NULL){
if(strcmp(d->d_name, ".") == 0 || strcmp(d->d_name, "..") == 0 ){
continue;
}
else if(d->d_type != DT_DIR){ // Any except folders.
printf("%*s- %s:%ld\n", indent, "", d->d_name, d->d_ino);
}
else if(d->d_type == DT_DIR){ // Folders only
printf("%*s[%s]\n", indent, "", d->d_name);
char path[1024];
snprintf(path, sizeof(path), "%s/%s", dirname, d->d_name);
indent +=2;
listDir(path);
indent -=2;
}
This function works just fine, but the only thing is that it outputs the following result as an example:
I need the output to be the container folder, files and then folders. The folders should be at the end of the list. For example, the above output should be:
I'd say you have two options:
Insert all of the readdir() results into a sorted data structure (or just put them in some array and sort it). And - I mean sorted in the sense of "file < directory and no other order".
Read all of the entries twice - once, ignore all the subdirectories and print just the files; then use rewinddir(), then read all of the entries again, ignoring all of the regular files and only printing the subdirectories.
Option 2 is probably simpler - but with more library calls and system calls.

Segfault when deleting non-empty directory using C

I am trying to delete a none-empty directory without system calls and without using extensive libraries. My code so far is...
int rmrf(char *path) {
char* path_copy = (char *) malloc(1024 * sizeof(char));
strcpy(path_copy, path);
DIR *directory = opendir(path_copy);
struct dirent *entry = readdir(directory);
while (entry != NULL) {
if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, "..")) { //skip /. and /..
} else if (entry->d_type == DT_DIR) { //directory recurse
strcat(path_copy, "/");
strcat(path_copy, entry->d_name);
rmrf(path_copy);
remove(path);
} else { //file delete
strcat(path_copy, "/");
strcat(path_copy, entry->d_name);
remove(path_copy);
}
entry = readdir(directory);
}
closedir(directory);
return 0;
}
my current file structure looks something like this...
Who
|---Region 1
|---County 1
|---SubCounty 1
|---County 2
|---Region 2
|---County 1
|---Region 3
currently I am getting seg faults but in different places as the day progresses. Earlier today I would get about two levels of recursion deep and then seg fault out but as of now I can't even make past a full level down. I can't figure out what is wrong and when I use gdb to look into the problem I get...
malloc.c: No such file or directory.
Any help would be appreciated!
UPDATE:
I have taken suggestions from paxdiablo and came up with the resulting function...
int rmrf(char *path) {
char* path_copy = malloc(1024);
DIR *directory = opendir(path);
struct dirent *entry = readdir(directory);
while (entry != NULL) {
if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, "..")) { //skip /. and /..
} else if (entry->d_type == DT_DIR) { //directory recurse
strcpy(path_copy, path);
strcat(path_copy, "/");
strcat(path_copy, entry->d_name);
rmrf(path_copy);
remove(path);
} else { //file delete
strcpy(path_copy, path);
strcat(path_copy, "/");
strcat(path_copy, entry->d_name);
remove(path_copy);
}
entry = readdir(directory);
}
closedir(directory);
free(path_copy);
return 0;
}
however I am still getting a seg fault though it is getting further in the recursion. The gdb output for the seg fault is as followed...
Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=av#entry=0x7ffff7dd1b20 <main_arena>, bytes=bytes#entry=32816) at malloc.c:3802
3802 malloc.c: No such file or directory.
(gdb) where
#0 _int_malloc (av=av#entry=0x7ffff7dd1b20 <main_arena>, bytes=bytes#entry=32816) at malloc.c:3802
#1 0x00007ffff7a91184 in __GI___libc_malloc (bytes=32816) at malloc.c:2913
#2 0x00007ffff7ad51ba in __alloc_dir (statp=0x7fffffffe190, flags=0, close_fd=true, fd=6) at ../sysdeps/posix/opendir.c:247
#3 opendir_tail (fd=6) at ../sysdeps/posix/opendir.c:145
#4 __opendir (name=<optimized out>) at ../sysdeps/posix/opendir.c:200
#5 0x0000000000401bca in rmrf ()
#6 0x0000000000401c8d in rmrf ()
#7 0x0000000000401c8d in rmrf ()
#8 0x0000000000402380 in main ()
Thoughts?
For your initial code, you do this once when entering the function:
strcpy(path_copy, path);
Then you do this for each file or directory in the current directory:
strcat(path_copy, "/");
strcat(path_copy, entry->d_name);
That means, if you have the files a, b and c in your current directory /xx, the path_copy variable will cycle through:
/xx/a /xx/a/b /xx/a/b/c
rather than the correct:
/xx/a /xx/b /xx/c
With a sufficiently large number of files, you will easily blow out the 1024 bytes allocates for the path.
If you want to fix that then you should start the variable from scratch each time:
if ((strcmp(entry->d_name, ".") != 0) && (strcmp(entry->d_name, "..") != 0)) {
if (entry->d_type == DT_DIR) {
strcpy(path_copy, path);
strcat(path_copy, "/");
strcat(path_copy, entry->d_name);
rmrf(path_copy);
remove(path);
} else {
sprintf(path_copy, "%s/%s", path, entry->d_name);
remove(path_copy);
}
}
You'll note that I've modified your initial condition a little so that it makes more sense (only do the inner bit if the file is neither . nor ..).
I've also shown, in the else clause, a shorter way of constructing the string to delete using sprintf rather than a set of strcpy/strcat calls. Feel free to do that in the if clause as well if you wish, I've left it using the old method so you can see all you needed to do was add the initial path.
And just a few extra points, applicable to your first and/or second code snippet:
You should also make sure you free the memory you allocate at each level, immediately before returning from the function, between closedir() and return.
You never need to cast the return value of malloc since a void * can be implicitly cast to any other type of pointer. In fact, it's dangerous to do so since it can hide certain subtle errors.
Similarly, you never need to multiply by sizeof(char) - that is, by definition, always one.
You can move the creation of path_copy to before the file/directory check since it's common to both parts.
And, finally, you're going to have troubles if the directory you're processing doesn't actually exist since opendir will return NULL and you will immediately try to pass that to readdir.
With all that in mind, I'd start with the following program which actually walks the tree and prints out all the files it finds. Once you're happy with that, you can add back in the bit that deletes:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>
int rmrf(char *path) {
char *path_copy = malloc(1024);
DIR *directory = opendir(path);
if (directory != NULL) {
struct dirent *entry = readdir(directory);
while (entry != NULL) {
if ((strcmp(entry->d_name, ".") != 0) && (strcmp(entry->d_name, "..") != 0)) {
sprintf(path_copy, "%s/%s", path, entry->d_name);
if (entry->d_type == DT_DIR) {
rmrf(path_copy);
puts(path);
} else {
puts(path_copy);
}
}
entry = readdir(directory);
}
closedir(directory);
}
free(path_copy);
return 0;
}
The main code is just a driver to ensure thinks are set up correctly. Just make sure, before running, you don't have (in your current directory) a paxtest or paxtest2 file or directory you want to keep around.
int main(void) {
system("rm -rf paxjunk");
system("mkdir paxjunk");
system("touch paxjunk/0.txt");
system("mkdir paxjunk/1");
system("touch paxjunk/1/1.txt");
system("mkdir paxjunk/2");
system("touch paxjunk/2/2.txt");
rmrf("paxjunk");
puts("===");
system("rm -rf paxjunk2");
rmrf("paxjunk2");
puts("===");
system("rm -rf paxjunk");
return 0;
}
When you run this, you should see it working okay:
paxjunk/0.txt
paxjunk/1/1.txt
paxjunk
paxjunk/2/2.txt
paxjunk
===
===

Given a path to a directory how do I check if a certain file exisits therein?

So I'm given a /path/to/a/directory/ and should check if index.php exists therein. If it does I should return /path/to/a/directory/index.php. If index.html exists instead i should return that, else NULL
Currently I am using fopen(file, 'r') but I dont think it does what I want it to do. I have also been looking into the functions stat() and scandir() but I am clueless on how I can use these... (even after reading the MAN pages over and over again ^^ )
/**
* Checks, in order, whether index.php or index.html exists inside of path.
* Returns path to first match if so, else NULL.
*/
char* indexes(const char* path)
{
char* newPath = malloc(strlen(path) + strlen("/index.html") + 1);
strcpy(newPath, path);
if(access( path, F_OK ) == 0 )
{
printf("access to path SUCCESS\n");
if( fopen( "index.php", "r" ))
{
strcat( newPath, "index.php" );
}
else if( fopen( "index.html", "r"))
{
strcat( newPath, "index.html" );
}
else
{
return NULL;
}
}
else
{
return NULL;
}
return newPath;
}
My main problem that I see here is that I dont think that my functions looks for the files to fopen() inside the desired path. Where exactly does they look for the files? My root folder?
Any input would be greatly appriciated.
What about opendir:
char* indexes(const char* path)
{
DIR *dir;
struct dirent *entry;
char* newPath = NULL;
dir = opendir(path);
while ((entry = readdir(dir)) != NULL) {
if (!strcmp(entry->d_name, "index.php") || !strcmp(entry->d_name, "index.html"))
newPath = malloc(strlen(path) + strlen(entry->d_name) + 2);
sprintf(newPath, "%s/%s", path, entry->d_name);
break;
}
closedir(dir);
return newPath ;
}
Here you open the directory entry and scan it with readdir that returns a structure identifying each file inside (for more details see man page for opendir and readdir).
The use of fopen is to be discouraged because is heavy for the system that will try to open each file, and when the directory contains thousands or more file it will be very slow.
your basic idea seen to be ok. Before you call fopen() construct the path/file name for each file. Allocate memory for the longest element and use it for e.g sprintf() to get your path. When fopen() succeeds you can return that pointer, if not don't forget to free() the memory.

writing ls from scratch recursively

I am working on a simple project to implement "ls -R" from scratch. Whenever I run what I have, my program just keeps searching the root directory over and over again. What am I doing wrong?
void lsR(char dirName[]) {
/*
The recursive function call.
*/
DIR *dir;
struct dirent *directory;
struct stat fileStat;
char type;
char **nameList[MAX_RECURSIVE_FILES];
struct passwd *user;
int count = 0;
int i = 0;
printf("\n");
printf("./%s :\n", dirName);
printf("\n");
if ((dir = opendir(dirName)) == NULL) {
perror("opendir error:");
return;
}
while ((directory = readdir(dir)) != NULL) {
if (stat(directory->d_name, &fileStat) < 0) {
perror("fstat error:");
return;
}
if (fileStat.st_uid == 1) {
continue;
}
user = getpwuid(fileStat.st_uid);
printf("%s ", directory->d_name);
fileType(&fileStat, &type);
if ((type == 'd') && (count < MAX_RECURSIVE_FILES)) {
nameList[count] = malloc(sizeof(char)*MAX_STRING_LENGTH);
strncpy(nameList[count++], directory->d_name, MAX_STRING_LENGTH);
}
}
closedir(dir);
printf("\n");
for (i=0; i<count; i++) {
printf("Calling lsR on: %s\n", nameList[i]);
lsR(nameList[i]);
}
}
When it executes, I get the following output:
"./. :
., .., ... all other files in my current working directory ....
./. :
., .., ... all other files in my current working directory...
"
Among the list of files in the current directory you've noticed . and .. The first one is a hardlink to the current directory and the second one to the parent directory. So when you recurse through your dir entries you will want to skip those two. Otherwise the first directory you will recurse into will be ., in other words the directory you've just gone through.
This is the reason of your program current behavior, but once you fix that you will run into the issue lurker mentioned in his answer.
Additional notes :
Are you sure about the char **nameList[MAX_RECURSIVE_FILES]; variable? Seems to me you want an array of char * not an array of char **.
Are you aware you can use the S_ISDIR macro on the st_mode field of your stat struct, in order to check that the current file is not a directory instead of your custom function?
You need to include the path relative to your program's current directory. Each nameList element will need to be dirName + "/" + directory->d_name.
If you started out calling lsR on the local directory, ./foo and foo has directory named bar under it, then to open bar you need to open ./foo/bar since your program is running from the directory represented by ..

How can I searches files in current dir and the files in directories that under current dir?

The function searches the files in current directory. If It accrosses a directory, It gets in and again searches for file except the current '.' and the previous '..' directory. But It doesnt work how I want.It does not get in the next directory.
int foo(char *currDir)
{
struct dirent *direntp;
DIR *dirp;
char currentDir[250];
if ((dirp = opendir(currDir)) == NULL)
{
perror ("Failed to open directory");
return 1;
}
//By Sabri Mev at GYTE
while ((direntp = readdir(dirp)) != NULL)
{
printf("%s\n", direntp->d_name);
if(direntp->d_type == DT_DIR)
{
if(strcmp(direntp->d_name,".") !=0 && strcmp(direntp->d_name,"..") != 0)
foo(direntp->d_name); //Recursive!
}
}
getcwd(currentDir,250);
printf("curr Dir : %s\n",currentDir );
while ((closedir(dirp) == -1) && (errno == EINTR)) ;
return 0;
}
Because your path is error.
try this
if(direntp->d_type == DT_DIR)
{
if(strcmp(direntp->d_name,".") !=0 && strcmp(direntp->d_name,"..") != 0)
{
sprintf(currentDir, "%s/%s", currDir, direntp->d_name);
foo(currentDir); //Recursive!
}
}
When you do the recursive call to foo() inside the loop, notice that what direntp->d_name contains is not the full path, but just the subdirectory name. You have to concatenate it with currDir and use the result to call foo().
For instance, if you're starting with foo("/home") and the first subdir is "root", you're calling recursively foo("root") when it should be foo("/home/root").
in direntp->d_name you access only the local directory name, it does not return the whole path
also getcwd function is deprecated. Use the ISO C++ conformant _getcwd instead (if you write in C++ off course).

Resources