How to make several threads read several files without interference?

How to make several threads read several files without interference? - c

I am studying mutexes and I am stuck in an exercise. For each file in a given directory, I have to create a thread to read it and display its contents (no problem if order is not correct).
So far, the threads are running this function:
void * reader_thread (void * arg)
{
char * file_path = (char*)arg;
FILE * f;
char temp[20];
int value;
f=fopen(file_path, "r");
printf("Opened %s.\n",file_path);
while (fscanf(f, "%s",temp)!=EOF)
if (!get_number (temp, &value)) /*Gets int value from given string (if numeric)*/
printf("Thread %lu -> %s: %d\n", pthread_self(), file_path, value );
fclose(f);
pthread_exit(NULL);
}
Being called by a function that receives a DIR pointer, previously created by opendir().
(I have omitted some error checking here to make it cleaner, but I get no error at all.)
int readfiles (DIR * dir, char * path)
{
struct dirent * temp = NULL;
char * file_path;
pthread_t thList [MAX_THREADS];
int nThreads=0, i;
memset(thList, 0, sizeof(pthread_t)*MAX_THREADS);
file_path=malloc((257+strlen(path))*sizeof(char));
while((temp = readdir (dir))!=NULL && nThreads<MAX_THREADS) /*Reads files from dir*/
{
if (temp->d_name[0] != '.') /*Ignores the ones beggining with '.'*/
{
get_file_path(path, temp->d_name, file_path); /*Computes rute (overwritten every iteration)*/
printf("Got %s.\n", file_path);
pthread_create(&thList[nThreads], NULL, reader_thread, (void * )file_path)
nThreads++;
}
}
printf("readdir: %s\n", strerror (errno )); /*Just in case*/
for (i=0; i<nThreads ; i++)
pthread_join(thList[i], NULL)
if (file_path)
free(file_path);
return 0;
}
My problem here is that, although paths are computed perfectly, the threads don't seem to receive the correct argument. They all read the same file. This is the output I get:
Got test/testB.
Got test/testA.
readdir: Success
Opened test/testA.
Thread 139976911939328 -> test/testA: 3536
Thread 139976911939328 -> test/testA: 37
Thread 139976911939328 -> test/testA: -38
Thread 139976911939328 -> test/testA: -985
Opened test/testA.
Thread 139976903546624 -> test/testA: 3536
Thread 139976903546624 -> test/testA: 37
Thread 139976903546624 -> test/testA: -38
Thread 139976903546624 -> test/testA: -985
If I join the threads before the next one begins, it works OK. So I assume there is a critical section somewhere, but I don't really know how to find it. I have tried mutexing the whole thread function:
void * reader_thread (void * arg)
{
pthread_mutex_lock(&mutex_file);
/*...*/
pthread_mutex_unlock(&mutex_file);
}
And also, mutexing the while loop in the second function. Even both at the same time. But it won't work in any way. By the way, mutex_file is a global variable, which is init'd by pthread_mutex_init() in main().
I would really appreciate a piece of advice with this, as I don't really know what I'm doing wrong. I would also appreciate some good reference or book, as mutexes and System V semaphores are feeling a bit difficult to me.
Thank you very much.

Well, you are passing exactly the same pointer as file path to both threads. As a result, they read file name from the same string and end up reading the same file. Actually, you get a little bit lucky here because in reality you have a race condition — you update the contents of the string pointer by file_path while firing up threads that read from that pointer, so you may end up with a thread reading that memory while it is being changed. What you have to do is allocate an argument for each thread separately (i.e. call malloc and related logic in your while loop), and then free those arguments once thread is exited.

Looks like you're using the same file_path buffer for all threads, just loading it over and over again with the next name. You need to allocate a new string for each thread, and have each thread delete the string after using it.
edit
Since you already have an array of threads, you could just make a parallel array of char[], each holding the filename for the corresponding thread. This would avoid malloc/free.

Related

C code stack corruption changing variable

I'm hoping that someone can help me out. I have not written much in C code in over a decade and just picked this back up 2 days ago so bear with me please as I am rusty. THANK YOU!
What:
I'm working on creating a very simple thread pool for an application. This code is written in C on CodeBlocks using GNU GCC for the compiler. It is built as a command line application. No additional files are linked or included.
The code should create X threads (in this case I have it set to 10) each of which sits and waits while watching an array entry (identified by the threads thread index or count) for any incoming data it might need to process. Once a given child has processed the data coming in via the array there is no need to pass the data back to the main thread; rather the child should simply reset that array entry to 0 to indicate that it is ready to process another input. The main thread will receive requests and will dole them out to whatever thread is available. If none are available then it will refuse to handle that input.
For simplicity sake the code below is a complete and working but trimmed and gutted version that DOES exhibit the stack overflow I am trying to track down. This compiles fine and initially runs fine but after a few passes the threadIndex value in the child thread process (workerThread) becomes corrupt and jumps to weird values - generally becoming the number of milliseconds I have put in for the 'Sleep' function.
What I have checked:
The threadIndex variable is not a global or shared variable.
All arrays are plenty big enough to handle the max number of threads I am creating.
All loops have the loopvariable reset to 0 before running.
I have not named multiple variables with the same name.
I use atomic_load to make sure I don't write to the same global array variable with two different threads at once please note I am rusty... I may be misunderstanding how this part works
I have placed test cases all over to see where the variable goes nuts and I am stumped.
Best Guess
All of my research confirms what I recall from years back; I likely am going out of bounds somewhere and causing stack corruption. I have looked at numerous other problems like this on google as well as on stack overflow and while all point me to the same conclusion I have been unable to figure out what specifically is wrong in my code.
#include<stdio.h>
//#include<string.h>
#include<pthread.h>
#include<stdlib.h>
#include<conio.h>
//#include<unistd.h>
#define ESCAPE 27
int maxThreads = 10;
pthread_t tid[21];
int ret[21];
int threadIncoming[21];
int threadRunning[21];
struct arg_struct {
char* arg1;
int arg2;
};
//sick of the stupid upper/lowercase nonsense... boom... fixed
void* sleep(int time){Sleep(time);}
void* workerThread(void *arguments)
{
//get the stuff passed in to us
struct arg_struct *args = (struct arg_struct *)arguments;
char *address = args -> arg1;
int threadIndex = args -> arg2;
//hold how many we have processed - we are unlikely to ever hit the max so no need to round robin this number at this point
unsigned long processedCount = 0;
//this never triggers so it IS coming in correctly
if(threadIndex > 20){
printf("INIT ERROR! ThreadIndex = %d", threadIndex);
sleep(1000);
}
unsigned long x = 0;
pthread_t id = pthread_self();
//as long as we should be running
while(__atomic_load_n (&threadRunning[threadIndex], __ATOMIC_ACQUIRE)){
//if and only if we have something to do...
if(__atomic_load_n (&threadIncoming[threadIndex], __ATOMIC_ACQUIRE)){
//simulate us doing something
//for(x=0; x<(0xFFFFFFF);x++);
sleep(2001);
//the value going into sleep is CLEARLY somehow ending up in index because you can change that to any number you want
//and next thing you know the next line says "First thread processing done on (the value given to sleep)
printf("\n First thread processing done on %d\n", threadIndex);
//all done doing something so clear the incoming so we can reuse it for our next one
//this error should not EVER be able to get thrown but it is.... something is corrupting our stack and going into memory that it shouldn't
if(threadIndex > 20){ printf("ERROR! ThreadIndex = %d", threadIndex); }
else{ __atomic_store_n (&threadIncoming[threadIndex], 0, __ATOMIC_RELEASE); }
//increment the processed count
++processedCount;
}
else{Sleep(10);}
}
//no need to do atomocity I don't think for this as it is only set on the exit and not read till after everything is done
ret[threadIndex] = processedCount;
pthread_exit(&ret[threadIndex]);
return NULL;
}
int main(void)
{
int i = 0;
int err;
int *ptr[21];
int doLoop = 1;
//initialize these all to set the threads to running and the status on incoming to NOT be processing
for(i=0;i < maxThreads;i++){
threadIncoming[i] = 0;
threadRunning[i] = 1;
}
//create our threads
for(i=0;i < maxThreads;i++)
{
struct arg_struct args;
args.arg1 = "here";
args.arg2 = i;
err = pthread_create(&(tid[i]), NULL, &workerThread, (void *)&args);
if (err != 0){ printf("\ncan't create thread :[%s]", strerror(err)); }
}
//loop until we hit escape
while(doLoop){
//see if we were pressed escape
if(kbhit()){ if(getch() == ESCAPE){ doLoop = 0; } }
//just for testing - actual version would load only as needed
for(i=0;i < maxThreads;i++){
//make sure we synchronize so we don't end up pointing into a garbage address or half loading when a thread accesses us or whatever was going on
if(!__atomic_load_n (&threadIncoming[i], __ATOMIC_ACQUIRE)){
__atomic_store_n (&threadIncoming[i], 1, __ATOMIC_RELEASE);
}
}
}
//exiting...
printf("\n'Esc' pressed. Now exiting...\n");
//call to end them all...
for(i=0;i < maxThreads;i++){ __atomic_store_n (&threadRunning[i], 0, __ATOMIC_RELEASE); }
//join them all back up - if we had an actual worthwhile value here we could use it
for(i=0;i < maxThreads;i++){
pthread_join(tid[i], (void**)&(ptr[i]));
printf("\n return value from thread %d is [%d]\n", i, *ptr[i]);
}
return 0;
}
Output
Here is the output I get. Note that how long it takes before it starts going crazy does seem to possibly vary but not much.
Output Screen with Error

I don't trust your handling of args, there seems to be a race condition. What if you create N threads before the first one of them gets to run? Then the first thread created will probably see the args for the N:th thread, rather than for the first, and so on.
I don't believe there's a guarantee that automatic variables used in a loop like that are created in non-overlapping areas; after all they go out of scope with each iteration of the loop.

Writing to global file with threads in C

I'm having an issue with being able to write to a file I've created globally, initialized in main (successfully), and writing to in a function used by multiple threads (on Linux).
#includes
FILE *f;
main(){
// Create threads successfully
f = fopen("fileName.txt", "w");
// Make sure the file was able to be created
if(f = NULL){
printf("Unable to create file");
exit(1);
}
// This much works, the check indicates the file was created
// successfully when I run it
while(1){
// loops for a while, getting input from user to direct threads
// When end is determined, waits for all the threads to finish,
// clears allocated memory, and closes file then returns
fclose(f);
return;
}
}
void *threadProcess(){
// Do stuff
// This printf works fine using the values i give the function, as is here
// The values are determined in 'Do stuff'
printf("%d trying to write \"%d BAL %d TIME %d.%06d %d.%06d\" to the file\n", cid, tmp->reqNum, balance, tmp->seconds, tmp->useconds, endTime.tv_sec, endTime.tv_usec);
fflush(stdout);
// There appears to be a Segmentation fault here
fprintf(f, "%d BAL %d TIME %d.%06d %d.%06d\n", tmp->reqNum, balance, tmp->seconds, tmp->useconds, endTime.tv_sec, endTime.tv_usec);
// Never gets here
}
What am I doing wrong here? As I said, the printf statement right before the fprintf statement works and outputs the correct stuff.
Am I wrong to assume that would ensure I don't have an pointer issues for fprintf?
Thanks

it was in my if(reqLog = NULL) check.... I was assigning not comparing. Sorry to have wasted your time haha. – tompon

How to check if a file is already opened in C

I am working on a multithreaded system where a file can be shared among different threads based on the file access permissions.
How can I check if file is already opened by another thread?

To find out if a named file is already opened on linux, you can scan the /proc/self/fd directory to see if the file is associated with a file descriptor. The program below sketches out a solution:
DIR *d = opendir("/proc/self/fd");
if (d) {
struct dirent *entry;
struct dirent *result;
entry = malloc(sizeof(struct dirent) + NAME_MAX + 1);
result = 0;
while (readdir_r(d, entry, &result) == 0) {
if (result == 0) break;
if (isdigit(result->d_name[0])) {
char path[NAME_MAX+1];
char buf[NAME_MAX+1];
snprintf(path, sizeof(path), "/proc/self/fd/%s",
result->d_name);
ssize_t bytes = readlink(path, buf, sizeof(buf));
buf[bytes] = '\0';
if (strcmp(file_of_interest, buf) == 0) break;
}
}
free(entry);
closedir(d);
if (result) return FILE_IS_FOUND;
}
return FILE_IS_NOT_FOUND;
From your comment, it seems what you want to do is to retrieve an existing FILE * if one has already been created by a previous call to fopen() on the file. There is no mechanism provided by the standard C library to iterate through all currently opened FILE *. If there was such a mechanism, you could derive its file descriptor with fileno(), and then query /proc/self/fd/# with readlink() as shown above.
This means you will need to use a data structure to manage your open FILE *s. Probably a hash table using the file name as the key would be the most useful for you.

If you tend to do it in shell, you can simply use lsof $filename.

You can use int flock(int fd, int operation); to mark a file as locked and also to check if it is locked.
Apply or remove an advisory lock on the open file specified by fd.
The argument operation is one of the following:
LOCK_SH Place a shared lock. More than one process may hold a
shared lock for a given file at a given time.
LOCK_EX Place an exclusive lock. Only one process may hold an
exclusive lock for a given file at a given time.
LOCK_UN Remove an existing lock held by this process.
flock should work in a threaded app if you open the file separately in each thread:
multiple threads able to get flock at the same time
There's more information about flock and it's potential weaknesses here.

I don't know much in the way of multithreading on Windows, but you have a lot of options if you're on Linux. Here is a FANTASTIC resource. You might also take advantage of any file-locking features offered inherently or explicitly by the OS (ex: fcntl). More on Linux locks here. Creating and manually managing your own mutexes offers you more flexibility than you would otherwise have. user814064's comment about flock() looks like the perfect solution, but it never hurts to have options!
Added a code example:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
FILE *fp;
int counter;
pthread_mutex_t fmutex = PTHREAD_MUTEX_INITIALIZER;
void *foo() {
// pthread_mutex_trylock() checks if the mutex is
// locked without blocking
//int busy = pthread_mutex_trylock(&fmutex);
// this blocks until the lock is released
pthread_mutex_lock(&fmutex);
fprintf(fp, "counter = %d\n", counter);
printf("counter = %d\n", counter);
counter++;
pthread_mutex_unlock(&fmutex);
}
int main() {
counter = 0;
fp = fopen("threads.txt", "w");
pthread_t thread1, thread2;
if (pthread_create(&thread1, NULL, &foo, NULL))
printf("Error creating thread 1");
if (pthread_create(&thread2, NULL, &foo, NULL))
printf("Error creating thread 2");
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
fclose(fp);
return 0;
}

If you need to determine whether another thread opened a file instead of knowing that a file was already opened, you're probably doing it the wrong way.
In a multithreaded application, you want to manage resources used in common in a list accessible by all the threads. That list needs to be managed in a multithread safe manner. This just means you need to lock a mutex, do things with the list, then unlock the mutex. Further, reading/writing to the files by more than one thread can be very complicated. Again, you need locking to do that safely. In most cases, it's much easier to mark the file as "busy" (a.k.a. a thread is using that file) and wait for the file to be "ready" (a.k.a. no thread is using it).
So assuming you have a form of linked list implementation, you can have a search of the list in a way similar to:
my_file *my_file_find(const char *filename)
{
my_file *l, *result = NULL;
pthread_mutex_lock(&fmutex);
l = my_list_of_files;
while(l != NULL)
{
if(strcmp(l->filename, filename) == 0)
{
result = l;
break;
}
l = l->next;
}
pthread_mutex_unlock(&fmutex);
return result;
}
If the function returns NULL, then no other threads had the file open while searching (since the mutex was unlocked, another thread could have opened the file before the function executed the return). If you need to open the file in a safe manner (i.e. only one thread can open file filename) then you need to have a my_file_open() function which locks, searches, adds a new my_file if not found, then return that new added my_file pointer. If the file already exists, then the my_file_open() probably returns NULL meaning that it could not open a file which that one thread can use (i.e. another thread is already using it).
Just remember that you can't unlock the mutex between the search and the add. So you can't use the my_file_find() function above without first getting a lock on your mutex (in which case you probably want to have recursive mutexes).
In other words, you can search the exiting list, grow the existing list, and shrink (a.k.a. close a file) only if you first lock the mutex, do ALL THE WORK, then unlock the mutex.
This is valid for any kind of resources, not just files. It could be memory buffers, a graphical interface widget, a USB port, etc.

MULTITHREADING c - read several files in the same file

I'm new at multithreading and I'm trying to simulate banking transactions on the same current account using multithreading.
Each thread reads the actions to perform from a file. The file will contain an operation for each line consisting of an integer. The main program have to create as many threads as files in the path.
int main(int argc,char*argv[]){
DIR *buff;
struct dirent *dptr = NULL;
pthread_t hiloaux[MAX_THREADS];
int i=0,j=0, nthreads=0;
char *pathaux;
memset(hiloaux,0,sizeof(pthread_t)*MAX_THREADS);
diraux=malloc((267+strlen(argv[1]))*sizeof(char));
buff=opendir(argv[1]);
while((dptr = readdir(buff)) != NULL && nthreads<MAX_THREADS)//read files in the path
{
if (dptr->d_name[0]!='.'){
pthread_mutex_lock(&mutex_path);//mutual exclusion
strcpy(pathaux,argv[1]);
strcat (pathaux,"/");
strcat (pathaux,dptr->d_name);//makes the route (ex:path/a.txt)
pthread_create(&hiloaux[nthreads],NULL,readfile,(void *)pathaux);
//creates a thread for each file in the path
nthreads++;
}
}
for (j=0;j<nthreads;j++){
pthread_join(hiloaux[j],NULL);
}
closedir(buff);
return 0;
}
My problem is that the threads don't seem to receive the correct path argument. Even though I have placed a mutex, (mutex_path) they all read the same file. I unlock this mutex inside the function readfile(), .
void *readfile(void *arg){
FILE *fichero;
int x=0,n=0;
int suma=0;
int cuenta2=0;
char * file_path = (char*)arg;
n=rand() % 5+1; //random number to sleep the program each time I read a file line
pthread_mutex_unlock(&mutex_path);//unlock the mutex
fichero = fopen(file_path, "r");
while (fscanf (fichero, "%d", &x)!=EOF){
pthread_mutex_lock(&mutex2);//mutual exclusion to protect variables(x,cuenta,cuenta2)
cuenta2+=x;
if (cuenta2<0){
printf("count discarded\n");
}
else cuenta=cuenta2;
pthread_mutex_unlock(&mutex2);
printf("sum= %d\n",cuenta);
sleep(n); //Each time i read a line,i sleep the thread and let other thread read other fileline
}
pthread_exit(NULL);
fclose(fichero);
}
When I run the program i get this output
alberto#ubuntu:~/Escritorio/practica3$ ./ejercicio3 path
read file-> path/fb
read -> 2
sum= 2
read file-> path/fb
read -> 2
sum= 2
read file-> path/fb
read -> 4
sum= 6
read file-> path/fb
read -> 4
sum= 6
read file-> path/fb
read -> 6
sum= 12
read file-> path/fb
read -> 6
sum= 12
It seems to work well, it reads a line and sleeps for a time, during this time another thread do its work, but the problem is that both threads open the same file (path/fb).
As i said before i think the problem is in path argument, is like the mutex_path did not make his work.
I would really appreciate a little help with this, as I don't really know what's wrong.
Thank you very much.

In your "readfile" function
the line
char * file_path = (char*)arg;
Just copies a pointer the string memory but not the memory itself.
So it can (and will) still be altered by the man thread while worker thread continues.
Make a memory copy there.
Or even better store all arguments to your threads in distinct memory in main thread already, so you wont need the first mutex at all.

First I do not see where u assign memory for pathaux. I am wondering how come strcpy or strcat is working rather than memory segmentation. Try compiling with C++ compiler and it may complain.
As to the problem u are passing the pointer, so every thread points to same location.
Correct approach would be inside readdir loop -
1. create memory and copy the path to it.(Note u want to create memory every time in loop)
2. pass this memory to new thread.
If you do this way :
a. you do not have to use mutex path.
b. call free at end of readfile method.

Joining threads confusion

I'm doing my homework, what I have to accomplish is count the directories and files of a given directory, but each directory that I found should be counted aswell with another thread of my process, this is what I have so far:
void *dirCounter(void *param){
queue<pthread_t> queue;
dir_ptr dir = (dir_ptr)param;
dir->countDir = 0;
DIR* dirName = dir->name;
struct dirent *curr;
off_t dsp;
dsp= telldir(dirName);
while(dsp!= -1){
curr = readdir(dirName);
if(curr == NULL){
break;
}
if(!strcmp(curr->d_name,".")|!strcmp(curr->d_name,"..")) { //To avoid counting . and ..
dsp = telldir(dirName); //Actual position asociated to the stream
continue; //Executes the beginning of the while
}
if(curr->d_type == DT_DIR){
dir->countDir++; //counts directories in the first level
//For each directory found, create another thread and add it to the queue:
pthread_attr_t attr1;
pthread_t tid1;
pthread_attr_init(&attr1);
dir_ptr par1 = (dir_ptr)malloc(sizeof(directorio));
par1->name = opendir(curr->d_name);
par1->countDir = par1->countFile = 0;
pthread_create(&tid1,&attr1, dirCounter, par1);
//queue.push(tid1);
}
if(curr->d_type == DT_REG){
dir->countFile++; //Counts files
}
dsp = telldir(dirName);
}
//pthread_join(tid1, NULL);
//while(!queue.empty()){
//pthread_join(queue.front(), NULL);
// queue.pop();
//}
printf("Dirs: %d Files: %d\n", dir->countDir, dir->countFile);
pthread_exit(NULL);
}
So far the code does count the current files and dirs of the "first level" if the join is commented, and then it just gives a segmentation fault, if the line is uncommented it gives just an output line and then dies with the segmentation fault.
The idea was to create a thread whenever I found a directory and then join all them at the end creating a semi-recursive routine.
Modifications:
char str[256];
strcpy(str, "./");
strcat(str, curr->d_name);
//strcat(str, "\"");
puts(str);
par1->name = opendir(str);
par1->countDir = par1->countFile = 0;
pthread_create(&tid1,&attr1, dirCounter, par1);
queue.push(tid1);
What it does after the modification:
Prints ALL the directories, however it does give segmentation fault and some threads do not complete it's task.

The proximate cause of your problem is that dir->name is NULL in the additional threads created, because opendir(curr->d_name); is failing. This is because the directory curr->d_name is not an absolute pathname - opendir() will look in the current working directory for the directory you're trying to open, but that directory is actually within the directory you're currently working on.
I suggest that instead of passing the DIR * value to the thread, you instead simply pass the pathname of the directory, and let the thread do the opendir() itself. It should then test the return value, and only proceed to call readdir() if opendir() returned non-NULL.
When you find a directory entry that is a directory, you need to construct a pathname to pass to the new thread by concatenating "/" and curr->d_name onto the pathname of the directory being processed.
Note that you do not need the dsp variable and the calls to telldir() at all. If you have a valid DIR *dir, you can loop over it simply with:
while (curr = readdir(dir)) {
/* Do something with curr */
}

I see a few bugs. I'm not sure if this explains your crash.
You allocated an instance of "directorio" for each directory and corresponding thread. But you never free it. Memory leak.
Is it the intent to print the total number of directories and files of the whole file system? Or just a individual directory and file count for each directory? If the former, you aren't adding the results back up. I would even suggest having all threads share the same integer pointers for dirCount and fileCount. (And use a lock to serialize access or just use __sync_add_and_fetch). You could also just use a set of global variables for the integer dir and file counts.
If the latter case (each thread prints it's own summation of child files), just pass a directory name (string) as the thread parameter, and let the thread use local variables off the stack for the counters. (The thread would call opendir on the string passed in. It would still need to free the allocated string passed in.)
You don't need to pass a pthread_attr_t instance into pthread_create. You can pass NULL as the second parameter and get the same effect.
You aren't checking the return value of pthread_create. If it were to fail (unlikely), then tid1 could be a garbage value.
Hope this helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to make several threads read several files without interference? - c

Related

C code stack corruption changing variable

Writing to global file with threads in C

How to check if a file is already opened in C

MULTITHREADING c - read several files in the same file

Joining threads confusion

Categories

Resources