using threading and mutex locks to search directories

using threading and mutex locks to search directories - c

I am new to threading and I believe I understand the concept. As locks are a necessary tool to use threading but are (or at least to me) confusing on how to use I need to use them but cannot seem to get them correct. The idea here is to search through directories to find CSV files. (more work will be done on CSVs but that is not relevant here) I have an algorithm to search through directories that works fine without the use of threading. (keep in mind that searching through directories is the kind of task that is perfect for recursion because you need to search through a directory to find another directory and when you find the new directory you want to search that directory) Since I need to use threading on each instance of finding new directory I have the same algorithm set up twice. Once in main where it finds directories and the calls a function (through threading) to search the found directories. Again, if I use this method without threading I have zero problems but with threading the arguments I send in to the function are overwritten. This happens even if I lock the entire function. Clearly I am not using locks and threading correctly but where I'm going wrong eludes me. I have test directories to verify that it is (or is not) working. I have 3 directories in the "." directory and then sub directories beyond that. It finds the first three directories (in main) fine then when it passes those into the threaded function it will search three different times but usually with searching the same directory more than once. In other words the path name seems to be overwritten. I'll post code so you can see what I'm doing. I thank you in advance. Links to complete code:sorter.h https://pastebin.com/0vQZbrmh sorter.c https://pastebin.com/9wd8aa74 dirWorker.c https://pastebin.com/Jd4i1ecr
In sorter.h
#define MAXTHREAD 255
extern pthread_mutex_t lock;
typedef
struct _dir_proc
{
char* path; //the path to the new found directory
char* colName; //related to the other work that must be done
} dir_proc;
In sorter.c
#include <pthread.h>
#include <assert.h>
#include <dirent.h>
#include "sorter.h"
pthread_mutex_t lock;
int main(int argc, char* argv[])
{
int err = 0;
pthread_t threads[MAXTHREAD];
DIR *dirPointer;
char* searchedDirectory = ".";
struct dirent *directEntry;
dir_proc *dir_proc_args = malloc(sizeof(struct _dir_proc));
assert(dir_proc_args != NULL);
dir_proc_args->path = (char*) malloc(256 * (sizeof(char));
assert(dir_proc_args->path != NULL);
dir_proc_args->colName = (char*) malloc(256 * sizeof(char));
assert(dir_proc_args->colName != NULL);
pthread_mutex_init(&lock, NULL)
//dir_proc_args->colName is saved here
if(!(dirPointer = opendir(searchedDirectory)))
{
fprintf(stderr, "opening of directory has failed");
exit(1);
}
while((directEntry = readdir(dirPointer)) != NULL)
{
//do stuff here to ensure it is a directory
//ensure that the dir we are looking at is not current or parent dir
//copy path of found directory to dir_proc_args->path
err = pthread_create(&threads[count++], NULL, &CSVFinder, (void*)dir_proc_args);
if(err != 0)
printf("can't create thread);
}
int i;
for(i=0; i < count; ++i)
{
pthread_join(threads[i], NULL);
}
pthread_mutex_destroy(&lock);
}
in CSVFinder function
#include <assert.h>
#include <pthread.h>
#include "sorter.h"
#include <dirent.h>
void *CSVFinder(void *args)
{
pthread_mutex_lock(&lock); //I have locked the entire function to see I can get it to work. this makes no sense to actually do
DIR *dirPointer;
struct dirent *directEntry;
dir_proc *funcArgs = (struct _dir_proc*)args;
char path[255];
strncpy(path, funcArgs->path, sizeof(path));
if(!(dirPointer = opendir(funcArgs->path)))
{
fprintf(stderr, "opening of directory has failed");
exit(1);
}
while((directEntry = readdir(dirPointer)) != NULL)
{
if(directEntry->d_type == DT_DIR) //if we are looking at a directory
{
//make sure the dir we are looking at is not current or parent dir
snprintf(funcArgs->path, (sizeof(path) + sizeof(directEntry->d_name)), "%s/%s", path, directEntry->d_name);
//I would like to be able to do a recursive call here
//to search for more directories but one thing at a time
}
}
closedir(dirPointer);
pthread_mutex_unlock(&lock);
return(NULL);
}
I hope I have not left out any relevant code. I tried to keep the code to a minimum while not leaving anything necessary out.

It's not clear to me why you want to create a thread to simply traverse a directory structure. However, I will point out a few issues I see.
One minor issue is you in the CSVFinder function, you call readder, not readdir.
But one glaring issue to me is that you do not initialize dirPointer in main or in the CSVFinder() function. I would expect to see a call like
dirPointer = opendir("/");
in the main() function before the while loop.
Then I would expect to see CSVFinder() initialize its dirPointer with a call to opendir(path) where path is a name to a subdirectory found in the main loop.
For a good reference to how to traverse a directory structure go here...
https://www.lemoda.net/c/recursive-directory/

Related

Create a directory in filesystem in C

I am doing a project in C, and I'm stuck on one thing, i need to check if the directoy "/var/log/PROJECT " exist, if not, my program must create it, the aplication will always run on super user, this is what im doing:
struct stat st = {0};
if (stat("/var/log/project/PROJECT", &st) == -1) {
printf("im creating the folder\n");
mode_t process_mask = umask(0);
int result_code = mkdir("/var/log/project/PROJECT", 0777);
umask(process_mask);
}else{
printf("exist\n");
}
Sorry for asking to "do my homework" but im really stuck...

Well, I'm going to run with my suspicion. If the problem is that the parent directory of the directory you're trying to create does not exist, the solution is to create the parent directory before it. This is not terribly difficult to do with recursion, thankfully. Try this:
#include <errno.h>
#include <libgen.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>
int create_directory_with_parents(char const *pathname, mode_t modus) {
struct stat st;
if(stat(pathname, &st) == -1) {
// If the directory does not yet exist:
//
// Make sure the parent directory is there. dirname() gives us the name of
// the parent directory, then we call this very function recursively because
// we are, after all, in a function that makes sure directories exist.
char *path_cpy = strdup(pathname);
char *dir = dirname(path_cpy);
int err = create_directory_with_parents(dir, modus);
free(path_cpy);
// If we were able to make sure the parent directory exists,
// make the directory we really want.
if(err == 0) {
mode_t process_mask = umask(0);
int err = mkdir(pathname, modus);
umask(process_mask);
}
// err is 0 if everything went well.
return err;
} else if(!S_ISDIR(st.st_mode)) {
// If the "directory" exists but is not a directory, complain.
errno = ENOTDIR;
return -1;
}
// If the directory exists and is a directory, there's nothing to do and
// nothing to complain about.
return 0;
}
int main(void) {
if(0 != create_directory_with_parents("/var/log/project/PROJECT", 0777)) {
perror("Could not create directory or parent of directory: ");
}
}
The recursion ends when the first parent directory is found that exists; that is with / at the latest.
One limitation of this implementation is that all parent directories will have the same access rights as the leaf directory, which may or may not be what you want. If this is not what you want, you'll have to change the modus parameter in the recursive call to create_directory_with_parents. How to pass several modus parameters for several layers of parent directories that may have to be created is something of a design question that depends on what exactly your needs are, so I cannot give a general answer to it.

Why not just execute the mkdir command anyway, and simply ignore the error it will produce if the directory already exists? This will save you playing around with stat.
Is there a reason you need the file permissions to be 777? If not, you can drop the umask bit too.

C Test For File Existence Before Calling execvp

I'm writing a UNIX minishell on ubuntu, and am trying to add built-in commands at this point. When it's not a built-in command I fork and then the child executes it, however for built-in commands I'll just execute it in the current process.
So, I need a way to see if the files exist(if they do it's not a built-in command), however execvp uses the environment PATH variable to automatically look for them, so I have no idea how I would manually check beforehand.
So, do you guys know how I could test an argument to see if it's a built-in command simply by supplying the name?
Thanks guys.

I have tested the answer by Tom
It contained a number of problems. I have fixed them here and provided a test program.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/stat.h>
int is_file(const char* path) {
struct stat buf;
stat(path, &buf);
return S_ISREG(buf.st_mode);
}
/*
* returns non-zero if the file is a file in the system path, and executable
*/
int is_executable_in_path(char *name)
{
char *path = getenv("PATH");
char *item = NULL;
int found = 0;
if (!path)
return 0;
path = strdup(path);
char real_path[4096]; // or PATH_MAX or something smarter
for (item = strtok(path, ":"); (!found) && item; item = strtok(NULL, ":"))
{
sprintf(real_path, "%s/%s", item, name);
// printf("Testing %s\n", real_path);
if ( is_file(real_path) && !(
access(real_path, F_OK)
|| access(real_path, X_OK))) // check if the file exists and is executable
{
found = 1;
}
}
free(path);
return found;
}
int main()
{
if (is_executable_in_path("."))
puts(". is executable");
if (is_executable_in_path("echo"))
puts("echo is executable");
}
Notes
the test for access return value was reversed
the second strtok call had the wrong delimiter
strtok changed the path argument. My sample uses a copy
there was nothing to guarantee a proper path separator char in the concatenated real_path
there was no check whether the matched file was actually a file (directories can be 'executable' too). This leads to strange things like . being recognized as an external binary

What you can do is you can change the path to the particular directory and then use #include<dirent.h> header file and its readdir and scandir functions to walk through the directory or stat structure to see if the file exists in the directory or not.

You can iterate yourself through the PATH directories, and for each entry in PATH (You will have to split PATH with :, probably using strtok) concatenate at the end of each path the name of the command called. When you have create this path, check if the file exists and if it is executable using access.
int is_built_in(char *path, char *name)
{
char *item = strtok(path, ":");
do {
char real_path[4096] = strcat(item, name); // you would normally alloc exactly the size needed but lets stick to it for the sake of the example
if (!access(real_path, F_OK) && !access(real_path, X_OK)) // check if the file exists and is executable
return 0;
} while ((item = strtok(NULL, ":")) != NULL);
return 1;
}

Why do you want to test before calling execvp? That's the wrong approach. Just call execvp and it will tell you if the program does not exist.

Traversing file system according to a given root place by using threads by using C for unix

I wanna traverse inside the file system by using threads and processes.My program has to assume the first parameter is either given as "-p" which offers a multi-process application or "-t" which runs in a multi-threaded way. The second parameter is the
pathname of a file or directory. If my program gets the path of a file, it should print out the size of the file in bytes. If my program gets the path of a directory, it should, in the same way, print out the directory name, then process all the entries in the
directory except the directory itself and the parent directory. If my program is given a directory, it must display the entire hierarchy rooted at the specified directory. I wrote something but I got stuck in.I can not improve my code.Please help me.
My code is as following:
include
include
include
include
include
include
include
int funcThread(DIR *D);
int main(int argc, char * argv[])
{
pthread_t thread[100];
DIR *dirPointer;
struct stat object_file;
struct dirent *object_dir;
int counter;
if(opendir(argv[1])==NULL)
{
printf("\n\nERROR !\n\n Please enter -p or -t \n\n");
return 0;
}
if((dirPointer=opendir(argv[1]))=="-t")
{
if ((object_dir = opendir(argv[2])) == NULL)
{
printf("\n\nERROR !\n\nPlease enter the third argument\n\n");
return 0;.
}
else
{
counter=0;
while ((object_dir = readdir(object_dir)) != NULL)
{
pthread_create(&thread[counter],NULL,funcThread,(void *) object_dir);
counter++;
}
}
}
return 0;
}
int funcThread(DIR *dPtr)
{
DIR *ptr;
struct stat oFile;
struct dirent *oDir;
int num;
if(ptr=readdir(dPtr)==NULL)
rewinddir(ptr);
if(S_ISDIR(oFile.st_mode))
{
ptr=readdir(dPtr);
printf("\t%s\n",ptr);
return funcThread(ptr);
}
else
{
while(ptr=readdir(dPtr)!=NULL)
{
printf("\n%s\n",oDir->d_name);
stat(oDir->d_name,&oFile);
printf("\n%f\n",oFile.st_size);
}
rewinddir(ptr);
}
}

This line:
if((dirPointer=opendir(argv[1]))=="-t")
dirPointer is a pointer DIR* so how can it be equal to a literal string pointer?

I spotted a few errors:
Why are you using opendir() to check your arguments? You should use something like strcmp for that.
You're passing struct dirent* to funcThread() but funcThread() takes a DIR*.
You're using oFile on funcThread() before you initialize it (by calling stat()).
What is the purpose of calling rewinddir()? I guess you're blindly trying to get readdir() to work with a struct dirent*.
You're using oDir but it's never initialized.
You're calling printf() from multiple threads with no means to synchronize the output so it would be completelly out of order or garbled.
I suggest you read and understand the documentation of all those functions before using them (google "posix function_name") and get familiar with the basics of C. And before bringing threads into the equation try to get it working on a single threaded program. Also you won't see an improvement in performance by using that many threads unless you have close to that many cores, it will actually decrease performance and increase resource usage.

if(ptr=readdir(dPtr)==NULL){}
The = operator has lower precedence than ==
[this error is repeated several times]

Recursive file delete in C on Linux

I have a C program that, at one point in the program has this:
system("rm -rf foo");
Where foo is a directory. I decided that, rather than calling system, it would be better to do the recursive delete right in the code. I assumed a piece of code to do this would be easy to find. Silly me. Anyway, I ended up writing this:
#include <stdio.h>
#include <sys/stat.h>
#include <dirent.h>
#include <libgen.h>
int recursiveDelete(char* dirname) {
DIR *dp;
struct dirent *ep;
char abs_filename[FILENAME_MAX];
dp = opendir (dirname);
if (dp != NULL)
{
while (ep = readdir (dp)) {
struct stat stFileInfo;
snprintf(abs_filename, FILENAME_MAX, "%s/%s", dirname, ep->d_name);
if (lstat(abs_filename, &stFileInfo) < 0)
perror ( abs_filename );
if(S_ISDIR(stFileInfo.st_mode)) {
if(strcmp(ep->d_name, ".") &&
strcmp(ep->d_name, "..")) {
printf("%s directory\n",abs_filename);
recursiveDelete(abs_filename);
}
} else {
printf("%s file\n",abs_filename);
remove(abs_filename);
}
}
(void) closedir (dp);
}
else
perror ("Couldn't open the directory");
remove(dirname);
return 0;
}
This seems to work, but I'm too scared to actually use it in production. I'm sure I've done something wrong. Does anyone know of a C library to do recursive delete I've missed, or can someone point out any mistakes I've made?
Thanks.

POSIX has a function called ftw(3) (file tree walk) that
walks through the directory tree that is located under the directory dirpath, and calls fn() once for each entry in the tree.

kudos for being scared to death, that's a healthy attitude to have in a case like this.
I have no library to suggest in which case you have two options:
1) 'run' this code exhaustively
a) not on a machine; on paper, with pencil. take an existing directory tree, list all the elements and run the program through each step, verify that it works
b) compile the code but replace all of the deletion calls with a line that does a printf - verify that it does what it should do
c) re-insert the deletion calls and run
2) use your original method (call system())

I would suggest one additional precaution that you can take.
Almost always when you delete multiple files and/or directories it would be a good idea to chroot() into the dir before executing anything that can destroy your data outside this directory.

I think you will need to call closedir() before recursiveDelete() (because you don't want/need all the directories open as you step into them. Also closedir() before calling remove() because remove() will probably give an error on the open directory. You should step through this once carefully to make sure that readdir() does not pickup the '..'. Also be wary of linked directories, you probably wouldn't want to recurse into directories that are
symbolic or hard links.

Delete files while reading directory with readdir()

My code is something like this:
DIR* pDir = opendir("/path/to/my/dir");
struct dirent pFile = NULL;
while ((pFile = readdir())) {
// Check if it is a .zip file
if (subrstr(pFile->d_name,".zip") {
// It is a .zip file, delete it, and the matching log file
char zipname[200];
snprintf(zipname, sizeof(zipname), "/path/to/my/dir/%s", pFile->d_name);
unlink(zipname);
char* logname = subsstr(zipname, 0, strlen(pFile->d_name)-4); // Strip of .zip
logname = appendstring(&logname, ".log"); // Append .log
unlink(logname);
}
closedir(pDir);
(this code is untested and purely an example)
The point is: Is it allowed to delete a file in a directory while looping through the directory with readdir()?
Or will readdir() still find the deleted .log file?

Quote from POSIX readdir:
If a file is removed from or added to
the directory after the most recent
call to opendir() or rewinddir(),
whether a subsequent call to readdir()
returns an entry for that file is
unspecified.
So, my guess is ... it depends.
It depends on the OS, on the time of day, on the relative order of the files added/deleted, ...
And, as a further point, between the time the readdir() function returns and you try to unlink() the file, some other process could have deleted that file and your unlink() fails.
Edit
I tested with this program:
#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(void) {
struct dirent *de;
DIR *dd;
/* create files `one.zip` and `one.log` before entering the readdir() loop */
printf("creating `one.log` and `one.zip`\n");
system("touch one.log"); /* assume it worked */
system("touch one.zip"); /* assume it worked */
dd = opendir("."); /* assume it worked */
while ((de = readdir(dd)) != NULL) {
printf("found %s\n", de->d_name);
if (strstr(de->d_name, ".zip")) {
char logname[1200];
size_t i;
if (*de->d_name == 'o') {
/* create `two.zip` and `two.log` when the program finds `one.zip` */
printf("creating `two.zip` and `two.log`\n");
system("touch two.zip"); /* assume it worked */
system("touch two.log"); /* assume it worked */
}
printf("unlinking %s\n", de->d_name);
if (unlink(de->d_name)) perror("unlink");
strcpy(logname, de->d_name);
i = strlen(logname);
logname[i-3] = 'l';
logname[i-2] = 'o';
logname[i-1] = 'g';
printf("unlinking %s\n", logname);
if (unlink(logname)) perror("unlink");
}
}
closedir(dd); /* assume it worked */
return 0;
}
On my computer, readdir() finds deleted files and does not find files created between opendir() and readdir(). But it may be different on another computer; it may be different on my computer if I compile with different options; it may be different if I upgrade the kernel; ...

I'm testing my new Linux reference book. The Linux Programming Interface by Michael Kerrisk and it says the following:
SUSv3 explicitly notes that it is unspecified whether readdir() will return a filename that has been added to or removed from since the last since the last call to opendir() or rewinddir(). All filenames that have been neither added nor removed since the last such call are guaranteed to be returned.
I think that what is unspecified is what happens to dirents not yet scanned. Once an entry has been returned, it is 100% guaranteed that it will not be returned anymore whether or not you unlink the current dirent.
Also note the guarantee provided by the second sentence. Since you are leaving alone the other files and only unlinking the current entry for the zip file, SUSv3 guarantees that all the other files will be returned. What happens to the log file is undefined. it may or may not be returned by readdir() but in your case, it shouldn't be harmful.
The reason why I have explored the question it is to find an efficient way to close file descriptors in a child process before exec().
The suggested way in APUE from Stevens is to do the following:
int max_open = sysconf(_SC_OPEN_MAX);
for (int i = 0; i < max_open; ++i)
close(i);
but I am thinking using code similar to what is found in the OP to scan /dev/fd/ directory to know exactly which fds I need to close. (Special note to myself, skip over dirfd contained in the DIR handle.)

I found the following page describe the solution of this problem.
https://support.apple.com/kb/TA21420

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

using threading and mutex locks to search directories - c

Related

Create a directory in filesystem in C

C Test For File Existence Before Calling execvp

Traversing file system according to a given root place by using threads by using C for unix

Recursive file delete in C on Linux

Delete files while reading directory with readdir()

Categories

Resources