I want to use condition variables to launch at most N thread to process all files one one huge directory (1M files).
The code seems to work but after some times, it blocks in main thread. Below the frustrating code:
void* run(void* ctx)
{
clientCtx* client = (clientCtx*)ctx;
printf("New file from thread %d: %s\n", client->num, client->filename);
free(client->filename);
pthread_mutex_lock(&clientFreeMutex);
client->state = IDLE_STATE;
pthread_cond_signal(&clientFreeCond);
printf("Thread %d is free\n", client->num);
pthread_mutex_unlock(&clientFreeMutex);
return NULL;
}
int main(int argc, char** argv)
{
pthread_t client[MAX_CLIENT] = {0};
clientCtx ctx[MAX_CLIENT] = {0};
DIR* directory = NULL;
struct dirent* element = NULL;
/* Initialize condition variable for max clients */
pthread_mutex_init(&clientFreeMutex, NULL);
pthread_cond_init(&clientFreeCond, NULL);
/* Initialize contexts for clients */
for (int cnt = 0; cnt < MAX_CLIENT; cnt ++)
{
ctx[cnt].state = IDLE_STATE;
ctx[cnt].num = cnt;
}
directory = opendir(argv[1]);
while((element = readdir(directory)) != NULL)
{
pthread_mutex_lock(&clientFreeMutex);
int cnt;
for (cnt = 0; cnt < MAX_CLIENT; cnt++)
{
if(ctx[cnt].state == IDLE_STATE)
{
ctx[cnt].filename = strdup(element->d_name);
ctx[cnt].state = BUSY_STATE;
pthread_create(&client[cnt], NULL, run, &(ctx[cnt]));
break;
}
}
/* No free client */
if (cnt == MAX_CLIENT)
{
printf("No free thread. Waiting.\n");
pthread_cond_wait(&clientFreeCond, &clientFreeMutex);
}
pthread_mutex_unlock(&clientFreeMutex);
}
closedir(directory);
exit(EXIT_SUCCESS);
}
What is the problem? thanks for your help :)
Warning you use the value of readdir in separate threads without any protection against the multi-threading, so when you (try to) printf client->file->d_name may be you are doing at the same time readdir in the main thread modifying the saved result, this has an undefined behavior.
You need for example to save a strdup of element->file->d_name in main and save that string in the clientCtx rather than the struct dirent *, and of course to free it in run
Note also a closedir is missing at the end of main even in this case it is not a real problem (just do to remember for your other programs).
I finally found the problem: launched threads were not joined and pthread_create finally returned an error code with errno message set to "Could not allocate memory". The signal was never sent and the main thread was then blocking.
I fixed this creating a new state for already launched threads and adding a join in main loop.
Related
I am working on an assignment which requires me to use threads to process and synchronize fetching data from a file. My professor told me that I can change my data to a void pointer to pass it to my function and then cast it back. I am trying to do this with file IO.
pthread_create(&th1, NULL, processing, (void *)&fp);
In my processing function I am trying to cast it back to a FILE pointer with this:
FILE driveOne = (FILE *)file;
This clearly doesn't work, so can someone explain this to me?
Here's a more complete example.
Let's say your worker function needs a file handle. For simplicity, let's say it reads each char from it, and returns the number of chars read, cast to a pointer:
void *worker(void *data)
{
FILE *handle = (FILE *)data;
uintptr_t count = 0;
if (handle && !ferror(handle)) {
/* handle is a valid file handle */
while (getc(handle) != EOF)
count++;
}
return (void *)count;
}
If count were of some other type than intptr_t or uintptr_t (declared in <stdint.h>, which is typically included by including <inttypes.h>), you'd need to cast it first to that type, and then to void pointer, i.e. (void *)(uintptr_t)count.
Because such worker threads don't need much stack (almost none, to be precise), and default thread stack sizes are huge (megabytes), we can save some memory (and allow much more threads if needed, especially on 32-bit architectures) by creating a pthread attribute that instructs pthread_create() to use a smaller stack. This attribute is not "consumed" by the call; it is more like an configuration block.
Let's say you have three streams, FILE *in[3];, and you wish to use three threads to check their lengths. Using a pthread attribute to use a smaller stack (2*PTHREAD_STACK_MIN, as defined in <limits.h>, is a good, safe value for worker threads that don't use alloca() or local arrays.):
pthread_t worker_id[3];
uintptr_t length[3];
pthread_attr_t attrs;
void *retptr;
int i, result;
/* Create a pthread attribute set, defining smaller stack size. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 2*PTHREAD_STACK_MIN);
/* Create the three worker threads. */
for (i = 0; i < 3; i++) {
result = pthread_create(&(worker_id[i]), &attrs, worker, (void *)in[i]);
if (result) {
fprintf(stderr, "Cannot create thread: %s.\n", strerror(result));
exit(EXIT_FAILURE);
}
}
/* pthread attributes are no longer needed. */
pthread_attr_destroy(&attrs);
/*
... This thread can do something else here ...
*/
/* Reap the threads, and collect their return values. */
for (i = 0; i < 3; i++) {
result = pthread_join(worker_id[i], &retptr);
if (result) {
fprintf(stderr, "Cannot reap thread: %s.\n", strerror(result));
exit(EXIT_FAILURE);
}
length[i] = (uintptr_t)retptr;
}
for (i = 0; i < 3; i++)
printf("in[%d] contained %llu chars.\n", i, (unsigned long long)length[i]);
The same pattern can be used when you want to pass multiple parameters to the thread function. You first construct a structure to hold those parameters, and create them. You can allocate them dynamically, declare them as global variables, or declare them as local variables in main() -- any scope that exists for the full duration when the worker thread exists, works.
For example, let's say your worker function calculates a histogram of each unsigned char value it reads from the stream:
struct work {
pthread_t id; /* Thread identifier */
FILE *in; /* File handle to read from */
size_t count[UCHAR_MAX + 1]; /* Histogram */
};
void *worker(void *data) {
struct work *const work = (struct worker_data *)data;
int c;
if (!work || !work->in) {
/* Invalid data, or invalid file handle. */
return (void *)(intptr_t)(EINVAL);
}
if (ferror(work->in)) {
/* Stream is in error state. */
return (void *)(intptr_t)(EIO);
}
/* Read the stream. */
while ((c = getc(work->in)) != EOF) {
/* Update histogram. */
work->count[(unsigned char)c]++;
}
/* Did the reading stop due to an I/O error? */
if (ferror(work->in))
return (void *)(intptr_t)(EIO);
/* No errors, all done. */
return (void *)0;
}
Note that struct work *const work = ... initializes a constant pointer work, not a pointer to constant. The const there is just an optimization that tells the C compiler that we won't try to modify work pointer itself. The data it points to, is modifiable.
(To read pointer declarations, read them from right to left, replacing each * with "is a pointer to", to get the proper sense of it.)
The code to create these workers is very similar, except that we allocate the work dynamically:
struct work *work[3];
pthread_attr_t attrs;
void *retptr;
int i, result;
/* Create and initialize the three pointers. */
for (i = 0; i < 3; i++) {
/* Allocate a work structure. */
work[i] = malloc(sizeof *(work[i]));
if (!work[i]) {
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
/* Copy the handle to read from, */
work[i]->in = in[i];
/* and clear the histogram part. */
memset(work[i]->count, 0, sizeof work[i]->count);
}
/* Create a pthread attribute set, defining smaller stack size. */
pthread_attr_init(&attrs);
pthread_attr_setstacksize(&attrs, 2*PTHREAD_STACK_MIN);
/* Create the three worker threads. */
for (i = 0; i < 3; i++) {
result = pthread_create(&(work[i]->id), &attrs, worker, (void *)work[i]);
if (result) {
fprintf(stderr, "Cannot create thread: %s.\n", strerror(result));
exit(EXIT_FAILURE);
}
}
/* pthread attributes are no longer needed. */
pthread_attr_destroy(&attrs);
/*
... This thread can do something else here ...
*/
/* Reap the threads, and collect their return values. */
for (i = 0; i < 3; i++) {
result = pthread_join(work[i]->id, &retptr);
if (result) {
fprintf(stderr, "Cannot reap thread: %s.\n", strerror(result));
exit(EXIT_FAILURE);
}
/* If the thread reported a failure, print the corresponding
error message (but do not exit). */
if (retptr)
fprintf(stderr, "Thread %d of 3: %s.\n", i+1, strerror((intptr_t)retptr));
/* ... print the histogram here? ... */
}
/* Free the work structures. */
for (i = 0; i < 3; i++)
free(work[i]);
If you don't want to abort the program when an error occurs, it is useful to note that free(NULL) is safe and does nothing; and that struct work *pointerarray[SIZE] = {0}; declares an array of SIZE pointers to struct work, and initializes them all to zero. For example, if an allocation or thread creation fails at some point, you can just free() each pointer, whether or not its allocation was successful.
That is, if you want to allocate three different types of structures (struct atype *a;, struct btype *b;, and struct ctype *c;), you can do
a = malloc(sizeof *a);
b = malloc(sizeof *b);
c = malloc(sizeof *c);
if (!a || !b || !c) {
free(c);
free(b);
free(a);
return ALLOCATION_FAILED;
}
/* Allocation was successful */
instead of allocating each one and testing for failure separately.
You need to declare driveOne to be FILE *, not FILE.
FILE *driveOne = (FILE *)file;
In addition, assuming that fp was initially declared as FILE *, your call to pthread_create should not have & before fp, like so:
pthread_create(&th1, NULL, processing, (void *)fp);
What is the best way to have multiple threads read a file at the same time ?
For example, if I tell my program to run with 4 threads and the file is 12 characters long, I want each thread to read 3 chars at the same time.
This is what I have so far :
thread function :
void *thread(void *arg) {
// can't seem to find the right solution to make it work here...
}
main function (thread_count is the number of threads and text_size the size of text) :
// Number of characters each thread should read
uint16_t thread_chars_num = (text_size / thread_count);
pthread_t threads[thread_count];
for (int i = 0; i < thread_count; i++) {
if(i == thread_count - 1) { // last thread might have more work
thread_chars_num += (text_size % thread_count )
}
if (pthread_create(&threads[i], NULL, thread, &thread_chars_num) != 0) {
fprintf(stderr, "pthread_create failed!\n");
return EXIT_FAILURE;
}
}
I was thinking of giving to the thread function a struct with index to start reading and index to stop reading, but it's really confusing and I can't seem to find the right solution.
Assuming you have a struct like:
struct ft
{
char* file_name;
int start_index;
int end_index;
};
Then in your thread:
void *thread(void *arg) {
int i;
int c;
struct ft* fi = (struct ft*)arg;
FILE* file = fopen(fi->file_name);
fseek (file , fi->start_index, SEEK_SET);
for(i = 0; i < fi->end_index - fi->start_index; i++)
{
c = getc(file);
//do something
}
}
Also, don't forget to do pthread_join in your main thread, which will make it wait for the other threads to finish.
I am trying to write a C program that calculates the size of a directory tree using threads for my assignment.
My code works fine when there is only one subdirectory, however whenever I have 2 or more subdirectories, I am getting a Segmentation Fault error. I was reading a lot about it and was not able to find a reason for my code to fail.
In my global scope:
pthread_mutex_t mutex;
int total_size = 0; // Global, to accumulate the size
main():
int main(int argc, char *argv[])
{
pthread_t thread;
...
if (pthread_mutex_init(&mutex, NULL) < 0)
{
perror("pthread_mutex_init");
exit(1);
}
pthread_create(&thread, NULL, dirsize, (void*)dirpath);
pthread_join(thread, NULL);
printf("\nTotal size: %d\n\n", total_size);
...
}
My dirsize() function:
void* dirsize(void* dir)
{
...
pthread_t tid[100];
int threads_created = 0;
dp=opendir(dir);
chdir(dir);
// Loop over files in directory
while ((entry = readdir(dp)) != NULL)
{
...
// Check if directory
if (S_ISDIR(statbuf.st_mode))
{
// Create new thread & call itself recursively
pthread_create(&tid[threads_created], NULL, dirsize, (void*)entry->d_name);
threads_created++;
}
else
{
// Add filesize
pthread_mutex_lock(&mutex);
total_size += statbuf.st_size;
pthread_mutex_unlock(&mutex);
}
}
for (i = 0; i < threads_created; i++)
{
pthread_join(tid[i], NULL);
}
}
What am I doing wrong here? Would greatly appreciate if you could point me to the right direction.
Here is what I'm getting through gdb: http://pastebin.com/TUkHspHH
Thank you in advance!
What's the value of NUM_THREADS?
// Check if directory
if (S_ISDIR(statbuf.st_mode))
{
// Create new thread & call itself recursively
pthread_create(&tid[threads_created], NULL, dirsize, (void*)entry->d_name);
threads_created++;
}
Here you should check if threads_created is equal to NUM_THREADS and if so increase the size of the tid array (that I would malloc at the function begin and free at the end, btw).
Moreover you should allocate a copy of the directory name (malloc + strcpy) before you pass it as argument to the thread and free such copy at the end of the function instead of entry->d_name.
I'm writing a variant of the producer-consumer problem with multi threading. I'm trying to use a queue to store the "produced" items until they get "consumed" later on. My problem is that when the consumer thread runs, it only processes the most recent item added to the queue (rather than the oldest item on the queue). Further, it processes that item repeatedly (up to the number of items on the queue itself).
I think that my problem might be that I need to allocate some memory when I push an item onto the queue (not sure about this, though). But then, I need a way to refer to this memory when that item is about to be consumed.
Anyway, here is a paired down version of my program. I realize that what I am posting here is incomplete (this is an infinite loop), but I'm trying just show the part that is relevant to this issue. The functions queue_push() and and queue_pop() are well tested, so I don't think that the problem lies there. I'll post more if needed.
Can anyone see why my consumer thread only processes the newest queue item? Thank you!
sem_t mutex;
queue q;
FILE* inputFPtr[10];
char host_in[BUFFERSIZE];
char host_out[BUFFERSIZE];
void* p(void* inputFile) {
while (fscanf(inputFile, INPUTFS, host_in) > 0)
{
sem_wait(&mutex);
queue_push(&q, host_in); //this function pushes the hostname onto the back of the queue
fprintf(stdout, "Produced: %d) %s\n", i, host_in);
sem_post(&mutex);
}
fclose (inputFile);
}
void* c() {
while (TRUE)
{
sem_wait(&mutex);
sprintf(hostname_out, "%s", (char *) queue_pop(&q));
printf("%s\n", host_out);
sem_post(&mutex);
}
}
int main (int argc, char* argv[]) {
int i;
pthread_t *th_in[argc-2];
pthread_t *th_out[2];
for (i = 0; i < (argc-2); i++) {
th_in[i] = (pthread_t *) malloc(sizeof(pthread_t));
inputFPtr[i] = fopen(argv[i+1], "r");
pthread_create (th_in[i], NULL, p, inputFPtr[i]);
}
for (i = 0; i < 2; i++) {
th_out[i] = (pthread_t *) malloc(sizeof(pthread_t));
pthread_create (th_out[i], NULL, c, null);
}
for (i = 0; i < (argc - 2); i++) {
pthread_join(*th_in[i], 0);
free(th_in[i]);
}
for (i = 0; i < (2); i++) {
pthread_join(*th_out[i], 0);
free(th_out[i]);
}
return EXIT_SUCCESS;
}
You forgot to post you code. However from your description, it seems like all the queue members point to the same memory block. This is why all your pops result with the same item.
The answer to you question is YES. You need to allocate memory for each one of the items and free it after it was "consumed".
Try to post some code for more specific answers...
I'm trying to do a simple multi-threaded consumer/producer, where multiple reader and writer thread, read from a file to the buffer and then from buffer back into a file. It should be thread safe. however, it is not performing as i expected. It halts half way but everytime on a different line?
Please help me understand what I am doing wrong?!?
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
//TODO Define global data structures to be used
#define BUF_SIZE 5
FILE *fr;
FILE *to; /* declare the file pointer */
struct _data {
pthread_mutex_t mutex;
pthread_cond_t cond_read;
pthread_cond_t cond_write;
int condition;
char buffer[BUF_SIZE];
int datainbuffer;
}dc1 = {
PTHREAD_MUTEX_INITIALIZER,PTHREAD_COND_INITIALIZER,PTHREAD_COND_INITIALIZER,0,{0},0
};
void *reader_thread(void *arg) {
//TODO: Define set-up required
struct _data *d = (struct _data *)arg;
int killreaders = 0;
while(1) {
//TODO: Define data extraction (queue) and processing
pthread_mutex_lock(&d->mutex);
while (d->condition == 0 || d->datainbuffer<=0){
pthread_cond_wait( &d->cond_read, &d->mutex );
if(killreaders == 1){
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_read);
pthread_cond_signal(&d->cond_write);
return NULL;
}
}
d->condition = 0;
int i;
char res;
//if the buffer is not full, that means the end of file is reached and it time to kill the threads remaining.
if(d->datainbuffer!=BUF_SIZE)
killreaders = 1;
for (i=0; i<(sizeof d->datainbuffer); i++) {
res = d->buffer[i];
printf("to file:%c",res);
fputc(res, to);
}
d->datainbuffer = 0;
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal( &d->cond_write );
}
return NULL;
}
void *writer_thread(void *arg) {
//TODO: Define set-up required
struct _data *d = (struct _data *)arg;
char * pChar;
int killwriters = 0;
while(1){
pthread_mutex_lock(&d->mutex);
while( d->condition == 1 || d->datainbuffer>0){
pthread_cond_wait( &d->cond_write, &d->mutex );
if(killwriters==1){
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_write);
pthread_cond_signal(&d->cond_read);
return NULL;
}
}
d->condition = 1;
int i;
char rc;
for (i = 0; i < BUF_SIZE; i++){
if((rc = getc(fr)) == EOF){
killwriters = 1;
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_read);
return NULL;
}
d->datainbuffer = i+1;
d->buffer[i] = rc;
printf("%c",rc);
}
int m = 0;
pthread_mutex_unlock(&d->mutex);
pthread_cond_signal(&d->cond_read);
}
return NULL;
}
#define M 10
#define N 20
int main(int argc, char **argv) {
struct _data dc=dc1;
fr = fopen ("from.txt", "rt"); /* open the file for reading */
if (fr == NULL)
{
printf("Could not open file!");
return 1;
}
to = fopen("to.txt", "wt");
int i;
pthread_t readers[N];
pthread_t writers[M];
for(i = 0; i < N; i++) {
pthread_create(&readers[i], NULL, reader_thread, (void*)&dc);
}
for(i = 0; i < M; i++) {
pthread_create(&writers[i], NULL, writer_thread, (void*)&dc);
}
fclose(fr);
fclose(to);
return 0;
}
any suggestion is appreciated!
Your threads are reading from and writing to files, which you open & close in main. But main doesn't explicitly wait for the threads to finish before closing those files.
In addition to the problem pointed out by Scott Hunter, your readers and writers do all their "real work" while holding the mutex, defeating the point of having more than one thread in the first place.
Readers should operate as follows:
1) Acquire mutex.
2) Block on the condition variable until work is available.
3) Remove work from queue, possibly signal condition variable.
4) Release mutex.
5) Process the work.
6) Go to step 1.
Writers should operate as follows:
1) Get the information we need to write.
2) Acquire the mutex.
3) Block on the condition variable until there is space on the queue.
4) Place information in the queue, possibly signal condition variable.
5) Release the mutex.
6) Go to step 1.
Notice both threads do the "real work" without holding the mutex? Otherwise, why have multiple threads if only one of them can do work at a time?
I'm not sure whether my answer is going to help you or not.. but I'm going to give my best by giving you some reference code.
I have written a similar program (except that it does not write to the file, instead display the queue-/produced-/consumed- items in the stdout). It can be found here - https://github.com/sangeeths/pc . I have separated the command-line processing and queue logic into a separate files.
Hope this helps!