Joining threads confusion - c

I'm doing my homework, what I have to accomplish is count the directories and files of a given directory, but each directory that I found should be counted aswell with another thread of my process, this is what I have so far:
void *dirCounter(void *param){
queue<pthread_t> queue;
dir_ptr dir = (dir_ptr)param;
dir->countDir = 0;
DIR* dirName = dir->name;
struct dirent *curr;
off_t dsp;
dsp= telldir(dirName);
while(dsp!= -1){
curr = readdir(dirName);
if(curr == NULL){
break;
}
if(!strcmp(curr->d_name,".")|!strcmp(curr->d_name,"..")) { //To avoid counting . and ..
dsp = telldir(dirName); //Actual position asociated to the stream
continue; //Executes the beginning of the while
}
if(curr->d_type == DT_DIR){
dir->countDir++; //counts directories in the first level
//For each directory found, create another thread and add it to the queue:
pthread_attr_t attr1;
pthread_t tid1;
pthread_attr_init(&attr1);
dir_ptr par1 = (dir_ptr)malloc(sizeof(directorio));
par1->name = opendir(curr->d_name);
par1->countDir = par1->countFile = 0;
pthread_create(&tid1,&attr1, dirCounter, par1);
//queue.push(tid1);
}
if(curr->d_type == DT_REG){
dir->countFile++; //Counts files
}
dsp = telldir(dirName);
}
//pthread_join(tid1, NULL);
//while(!queue.empty()){
//pthread_join(queue.front(), NULL);
// queue.pop();
//}
printf("Dirs: %d Files: %d\n", dir->countDir, dir->countFile);
pthread_exit(NULL);
}
So far the code does count the current files and dirs of the "first level" if the join is commented, and then it just gives a segmentation fault, if the line is uncommented it gives just an output line and then dies with the segmentation fault.
The idea was to create a thread whenever I found a directory and then join all them at the end creating a semi-recursive routine.
Modifications:
char str[256];
strcpy(str, "./");
strcat(str, curr->d_name);
//strcat(str, "\"");
puts(str);
par1->name = opendir(str);
par1->countDir = par1->countFile = 0;
pthread_create(&tid1,&attr1, dirCounter, par1);
queue.push(tid1);
What it does after the modification:
Prints ALL the directories, however it does give segmentation fault and some threads do not complete it's task.

The proximate cause of your problem is that dir->name is NULL in the additional threads created, because opendir(curr->d_name); is failing. This is because the directory curr->d_name is not an absolute pathname - opendir() will look in the current working directory for the directory you're trying to open, but that directory is actually within the directory you're currently working on.
I suggest that instead of passing the DIR * value to the thread, you instead simply pass the pathname of the directory, and let the thread do the opendir() itself. It should then test the return value, and only proceed to call readdir() if opendir() returned non-NULL.
When you find a directory entry that is a directory, you need to construct a pathname to pass to the new thread by concatenating "/" and curr->d_name onto the pathname of the directory being processed.
Note that you do not need the dsp variable and the calls to telldir() at all. If you have a valid DIR *dir, you can loop over it simply with:
while (curr = readdir(dir)) {
/* Do something with curr */
}

I see a few bugs. I'm not sure if this explains your crash.
You allocated an instance of "directorio" for each directory and corresponding thread. But you never free it. Memory leak.
Is it the intent to print the total number of directories and files of the whole file system? Or just a individual directory and file count for each directory? If the former, you aren't adding the results back up. I would even suggest having all threads share the same integer pointers for dirCount and fileCount. (And use a lock to serialize access or just use __sync_add_and_fetch). You could also just use a set of global variables for the integer dir and file counts.
If the latter case (each thread prints it's own summation of child files), just pass a directory name (string) as the thread parameter, and let the thread use local variables off the stack for the counters. (The thread would call opendir on the string passed in. It would still need to free the allocated string passed in.)
You don't need to pass a pthread_attr_t instance into pthread_create. You can pass NULL as the second parameter and get the same effect.
You aren't checking the return value of pthread_create. If it were to fail (unlikely), then tid1 could be a garbage value.
Hope this helps.

Related

How to make several threads read several files without interference?

I am studying mutexes and I am stuck in an exercise. For each file in a given directory, I have to create a thread to read it and display its contents (no problem if order is not correct).
So far, the threads are running this function:
void * reader_thread (void * arg)
{
char * file_path = (char*)arg;
FILE * f;
char temp[20];
int value;
f=fopen(file_path, "r");
printf("Opened %s.\n",file_path);
while (fscanf(f, "%s",temp)!=EOF)
if (!get_number (temp, &value)) /*Gets int value from given string (if numeric)*/
printf("Thread %lu -> %s: %d\n", pthread_self(), file_path, value );
fclose(f);
pthread_exit(NULL);
}
Being called by a function that receives a DIR pointer, previously created by opendir().
(I have omitted some error checking here to make it cleaner, but I get no error at all.)
int readfiles (DIR * dir, char * path)
{
struct dirent * temp = NULL;
char * file_path;
pthread_t thList [MAX_THREADS];
int nThreads=0, i;
memset(thList, 0, sizeof(pthread_t)*MAX_THREADS);
file_path=malloc((257+strlen(path))*sizeof(char));
while((temp = readdir (dir))!=NULL && nThreads<MAX_THREADS) /*Reads files from dir*/
{
if (temp->d_name[0] != '.') /*Ignores the ones beggining with '.'*/
{
get_file_path(path, temp->d_name, file_path); /*Computes rute (overwritten every iteration)*/
printf("Got %s.\n", file_path);
pthread_create(&thList[nThreads], NULL, reader_thread, (void * )file_path)
nThreads++;
}
}
printf("readdir: %s\n", strerror (errno )); /*Just in case*/
for (i=0; i<nThreads ; i++)
pthread_join(thList[i], NULL)
if (file_path)
free(file_path);
return 0;
}
My problem here is that, although paths are computed perfectly, the threads don't seem to receive the correct argument. They all read the same file. This is the output I get:
Got test/testB.
Got test/testA.
readdir: Success
Opened test/testA.
Thread 139976911939328 -> test/testA: 3536
Thread 139976911939328 -> test/testA: 37
Thread 139976911939328 -> test/testA: -38
Thread 139976911939328 -> test/testA: -985
Opened test/testA.
Thread 139976903546624 -> test/testA: 3536
Thread 139976903546624 -> test/testA: 37
Thread 139976903546624 -> test/testA: -38
Thread 139976903546624 -> test/testA: -985
If I join the threads before the next one begins, it works OK. So I assume there is a critical section somewhere, but I don't really know how to find it. I have tried mutexing the whole thread function:
void * reader_thread (void * arg)
{
pthread_mutex_lock(&mutex_file);
/*...*/
pthread_mutex_unlock(&mutex_file);
}
And also, mutexing the while loop in the second function. Even both at the same time. But it won't work in any way. By the way, mutex_file is a global variable, which is init'd by pthread_mutex_init() in main().
I would really appreciate a piece of advice with this, as I don't really know what I'm doing wrong. I would also appreciate some good reference or book, as mutexes and System V semaphores are feeling a bit difficult to me.
Thank you very much.
Well, you are passing exactly the same pointer as file path to both threads. As a result, they read file name from the same string and end up reading the same file. Actually, you get a little bit lucky here because in reality you have a race condition — you update the contents of the string pointer by file_path while firing up threads that read from that pointer, so you may end up with a thread reading that memory while it is being changed. What you have to do is allocate an argument for each thread separately (i.e. call malloc and related logic in your while loop), and then free those arguments once thread is exited.
Looks like you're using the same file_path buffer for all threads, just loading it over and over again with the next name. You need to allocate a new string for each thread, and have each thread delete the string after using it.
edit
Since you already have an array of threads, you could just make a parallel array of char[], each holding the filename for the corresponding thread. This would avoid malloc/free.

Looping through directories in proc

I have a function which loops through the directories in the proc file system. This function then greps a process name to find its PID and returns this PID to the calling function.
The function seems to work fine but fails in one or two cases while opening some directory(corresponding to a process).
This is what I am doing.
dr = readdir(dp);
Loop through dr
Check dr type for directory and process name
compare the process name with a string.
Return PID in case of a match
dr = readdir(dp);
end loop
main() {
DIR *d;
struct dirent *e;
e=malloc(sizeof(struct dirent));
d=opendir("/proc");
while ((e = readdir(d)) != NULL) {
printf("%d %s\n", e->d_type, e->d_name);
}
closedir(d);
}
Presumably the problem is that directories are disappearing before you get to check out the files inside. This would mean that a process that was running when you go the directory listing is no longer running when you go to read its process information. This is normal and something you'll have to handle (ideally silently) in your application.
Also, the code snippet you provided definitely does not do what you described above it. Presumably you edited it for simplicity, but in doing so you removed any clues as to what you might be doing wrong.

Custom shell glob problem

I have to write a shell program in c that doesn't use the system() function. One of the features is that we have to be able to use wild cards. I can't seem to find a good example of how to use glob or this fnmatch functions that I have been running into so I have been messing around and so far I have a some what working blog feature (depending on how I have arranged my code).
If I have a glob variable declared as a global then the function partially works. However any command afterwards produces in error. example:
ls *.c
produce correct results
ls -l //no glob required
null passed through
so I tried making it a local variable. This is my code right now:
int runCommand(commandStruct * command1) {
if(!globbing)
execvp(command1->cmd_path, command1->argv);
else{
glob_t globbuf;
printf("globChar: %s\n", globChar);
glob(globChar, GLOB_DOOFFS, NULL, &globbuf);
//printf("globbuf.gl_pathv[0]: %s\n", &globbuf.gl_pathv[0]);
execvp(command1->cmd_path, &globbuf.gl_pathv[0]);
//globfree(&globbuf);
globbing = 0;
}
return 1;
}
When doing this with the globbuf as a local, it produces a null for globbuf.gl_path[0]. Can't seem to figure out why. Anyone with a knowledge of how glob works know what might be the cause? Can post more code if necessary but this is where the problem lies.
this works for me:
...
glob_t glob_buffer;
const char * pattern = "/tmp/*";
int i;
int match_count;
glob( pattern , 0 , NULL , &glob_buffer );
match_count = glob_buffer.gl_pathc;
printf("Number of mathces: %d \n", match_count);
for (i=0; i < match_count; i++)
printf("match[%d] = %s \n",i,glob_buffer.gl_pathv[i]);
globfree( &glob_buffer );
...
Observe that the execvp function expects the argument list to end with a NULL pointer, i.e. I think it will be the easiest to create your own char ** argv copy with all the elements from the glob_buffer.gl_pathv[] and a NULL pointer at the end.
You are asking for GLOB_DOOFFS but you did not specify any number in globbuf.gl_offs saying how many slots to reserve.
Presumably as a global variable it gets initialized to 0.
Also this: &globbuf.gl_pathv[0] can simply be globbuf.gl_pathv.
And don't forget to run globfree(globbuf).
I suggest running your program under valgrind because it probably has a number of memory leaks, and/or access to uninitialized memory.
If you don't have to use * style wildcards I've always found it simpler to use opendir(), readdir() and strcasestr(). opendir() opens a directory (can be ".") like a file, readdir() reads an entry from it, returns NULL at the end. So use it like
struct dirent *de = NULL;
DIR *dirp = opendir(".");
while ((de = readdir(dirp)) != NULL) {
if ((strcasestr(de->d_name,".jpg") != NULL) {
// do something with your JPEG
}
}
Just remember to closedir() what you opendir(). A struct dirent has the d_type field if you want to use it, most files are type DT_REG (not dirs, pipes, symlinks, sockets, etc.).
It doesn't make a list like glob does, the directory is the list, you just use criteria to control what you select from it.

Traversing file system according to a given root place by using threads by using C for unix

I wanna traverse inside the file system by using threads and processes.My program has to assume the first parameter is either given as "-p" which offers a multi-process application or "-t" which runs in a multi-threaded way. The second parameter is the
pathname of a file or directory. If my program gets the path of a file, it should print out the size of the file in bytes. If my program gets the path of a directory, it should, in the same way, print out the directory name, then process all the entries in the
directory except the directory itself and the parent directory. If my program is given a directory, it must display the entire hierarchy rooted at the specified directory. I wrote something but I got stuck in.I can not improve my code.Please help me.
My code is as following:
include
include
include
include
include
include
include
int funcThread(DIR *D);
int main(int argc, char * argv[])
{
pthread_t thread[100];
DIR *dirPointer;
struct stat object_file;
struct dirent *object_dir;
int counter;
if(opendir(argv[1])==NULL)
{
printf("\n\nERROR !\n\n Please enter -p or -t \n\n");
return 0;
}
if((dirPointer=opendir(argv[1]))=="-t")
{
if ((object_dir = opendir(argv[2])) == NULL)
{
printf("\n\nERROR !\n\nPlease enter the third argument\n\n");
return 0;.
}
else
{
counter=0;
while ((object_dir = readdir(object_dir)) != NULL)
{
pthread_create(&thread[counter],NULL,funcThread,(void *) object_dir);
counter++;
}
}
}
return 0;
}
int funcThread(DIR *dPtr)
{
DIR *ptr;
struct stat oFile;
struct dirent *oDir;
int num;
if(ptr=readdir(dPtr)==NULL)
rewinddir(ptr);
if(S_ISDIR(oFile.st_mode))
{
ptr=readdir(dPtr);
printf("\t%s\n",ptr);
return funcThread(ptr);
}
else
{
while(ptr=readdir(dPtr)!=NULL)
{
printf("\n%s\n",oDir->d_name);
stat(oDir->d_name,&oFile);
printf("\n%f\n",oFile.st_size);
}
rewinddir(ptr);
}
}
This line:
if((dirPointer=opendir(argv[1]))=="-t")
dirPointer is a pointer DIR* so how can it be equal to a literal string pointer?
I spotted a few errors:
Why are you using opendir() to check your arguments? You should use something like strcmp for that.
You're passing struct dirent* to funcThread() but funcThread() takes a DIR*.
You're using oFile on funcThread() before you initialize it (by calling stat()).
What is the purpose of calling rewinddir()? I guess you're blindly trying to get readdir() to work with a struct dirent*.
You're using oDir but it's never initialized.
You're calling printf() from multiple threads with no means to synchronize the output so it would be completelly out of order or garbled.
I suggest you read and understand the documentation of all those functions before using them (google "posix function_name") and get familiar with the basics of C. And before bringing threads into the equation try to get it working on a single threaded program. Also you won't see an improvement in performance by using that many threads unless you have close to that many cores, it will actually decrease performance and increase resource usage.
if(ptr=readdir(dPtr)==NULL){}
The = operator has lower precedence than ==
[this error is repeated several times]

Delete files while reading directory with readdir()

My code is something like this:
DIR* pDir = opendir("/path/to/my/dir");
struct dirent pFile = NULL;
while ((pFile = readdir())) {
// Check if it is a .zip file
if (subrstr(pFile->d_name,".zip") {
// It is a .zip file, delete it, and the matching log file
char zipname[200];
snprintf(zipname, sizeof(zipname), "/path/to/my/dir/%s", pFile->d_name);
unlink(zipname);
char* logname = subsstr(zipname, 0, strlen(pFile->d_name)-4); // Strip of .zip
logname = appendstring(&logname, ".log"); // Append .log
unlink(logname);
}
closedir(pDir);
(this code is untested and purely an example)
The point is: Is it allowed to delete a file in a directory while looping through the directory with readdir()?
Or will readdir() still find the deleted .log file?
Quote from POSIX readdir:
If a file is removed from or added to
the directory after the most recent
call to opendir() or rewinddir(),
whether a subsequent call to readdir()
returns an entry for that file is
unspecified.
So, my guess is ... it depends.
It depends on the OS, on the time of day, on the relative order of the files added/deleted, ...
And, as a further point, between the time the readdir() function returns and you try to unlink() the file, some other process could have deleted that file and your unlink() fails.
Edit
I tested with this program:
#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(void) {
struct dirent *de;
DIR *dd;
/* create files `one.zip` and `one.log` before entering the readdir() loop */
printf("creating `one.log` and `one.zip`\n");
system("touch one.log"); /* assume it worked */
system("touch one.zip"); /* assume it worked */
dd = opendir("."); /* assume it worked */
while ((de = readdir(dd)) != NULL) {
printf("found %s\n", de->d_name);
if (strstr(de->d_name, ".zip")) {
char logname[1200];
size_t i;
if (*de->d_name == 'o') {
/* create `two.zip` and `two.log` when the program finds `one.zip` */
printf("creating `two.zip` and `two.log`\n");
system("touch two.zip"); /* assume it worked */
system("touch two.log"); /* assume it worked */
}
printf("unlinking %s\n", de->d_name);
if (unlink(de->d_name)) perror("unlink");
strcpy(logname, de->d_name);
i = strlen(logname);
logname[i-3] = 'l';
logname[i-2] = 'o';
logname[i-1] = 'g';
printf("unlinking %s\n", logname);
if (unlink(logname)) perror("unlink");
}
}
closedir(dd); /* assume it worked */
return 0;
}
On my computer, readdir() finds deleted files and does not find files created between opendir() and readdir(). But it may be different on another computer; it may be different on my computer if I compile with different options; it may be different if I upgrade the kernel; ...
I'm testing my new Linux reference book. The Linux Programming Interface by Michael Kerrisk and it says the following:
SUSv3 explicitly notes that it is unspecified whether readdir() will return a filename that has been added to or removed from since the last since the last call to opendir() or rewinddir(). All filenames that have been neither added nor removed since the last such call are guaranteed to be returned.
I think that what is unspecified is what happens to dirents not yet scanned. Once an entry has been returned, it is 100% guaranteed that it will not be returned anymore whether or not you unlink the current dirent.
Also note the guarantee provided by the second sentence. Since you are leaving alone the other files and only unlinking the current entry for the zip file, SUSv3 guarantees that all the other files will be returned. What happens to the log file is undefined. it may or may not be returned by readdir() but in your case, it shouldn't be harmful.
The reason why I have explored the question it is to find an efficient way to close file descriptors in a child process before exec().
The suggested way in APUE from Stevens is to do the following:
int max_open = sysconf(_SC_OPEN_MAX);
for (int i = 0; i < max_open; ++i)
close(i);
but I am thinking using code similar to what is found in the OP to scan /dev/fd/ directory to know exactly which fds I need to close. (Special note to myself, skip over dirfd contained in the DIR handle.)
I found the following page describe the solution of this problem.
https://support.apple.com/kb/TA21420

Resources