I want to use condition variables to launch at most N thread to process all files one one huge directory (1M files).
The code seems to work but after some times, it blocks in main thread. Below the frustrating code:
void* run(void* ctx)
{
clientCtx* client = (clientCtx*)ctx;
printf("New file from thread %d: %s\n", client->num, client->filename);
free(client->filename);
pthread_mutex_lock(&clientFreeMutex);
client->state = IDLE_STATE;
pthread_cond_signal(&clientFreeCond);
printf("Thread %d is free\n", client->num);
pthread_mutex_unlock(&clientFreeMutex);
return NULL;
}
int main(int argc, char** argv)
{
pthread_t client[MAX_CLIENT] = {0};
clientCtx ctx[MAX_CLIENT] = {0};
DIR* directory = NULL;
struct dirent* element = NULL;
/* Initialize condition variable for max clients */
pthread_mutex_init(&clientFreeMutex, NULL);
pthread_cond_init(&clientFreeCond, NULL);
/* Initialize contexts for clients */
for (int cnt = 0; cnt < MAX_CLIENT; cnt ++)
{
ctx[cnt].state = IDLE_STATE;
ctx[cnt].num = cnt;
}
directory = opendir(argv[1]);
while((element = readdir(directory)) != NULL)
{
pthread_mutex_lock(&clientFreeMutex);
int cnt;
for (cnt = 0; cnt < MAX_CLIENT; cnt++)
{
if(ctx[cnt].state == IDLE_STATE)
{
ctx[cnt].filename = strdup(element->d_name);
ctx[cnt].state = BUSY_STATE;
pthread_create(&client[cnt], NULL, run, &(ctx[cnt]));
break;
}
}
/* No free client */
if (cnt == MAX_CLIENT)
{
printf("No free thread. Waiting.\n");
pthread_cond_wait(&clientFreeCond, &clientFreeMutex);
}
pthread_mutex_unlock(&clientFreeMutex);
}
closedir(directory);
exit(EXIT_SUCCESS);
}
What is the problem? thanks for your help :)
Warning you use the value of readdir in separate threads without any protection against the multi-threading, so when you (try to) printf client->file->d_name may be you are doing at the same time readdir in the main thread modifying the saved result, this has an undefined behavior.
You need for example to save a strdup of element->file->d_name in main and save that string in the clientCtx rather than the struct dirent *, and of course to free it in run
Note also a closedir is missing at the end of main even in this case it is not a real problem (just do to remember for your other programs).
I finally found the problem: launched threads were not joined and pthread_create finally returned an error code with errno message set to "Could not allocate memory". The signal was never sent and the main thread was then blocking.
I fixed this creating a new state for already launched threads and adding a join in main loop.
What is the best way to have multiple threads read a file at the same time ?
For example, if I tell my program to run with 4 threads and the file is 12 characters long, I want each thread to read 3 chars at the same time.
This is what I have so far :
thread function :
void *thread(void *arg) {
// can't seem to find the right solution to make it work here...
}
main function (thread_count is the number of threads and text_size the size of text) :
// Number of characters each thread should read
uint16_t thread_chars_num = (text_size / thread_count);
pthread_t threads[thread_count];
for (int i = 0; i < thread_count; i++) {
if(i == thread_count - 1) { // last thread might have more work
thread_chars_num += (text_size % thread_count )
}
if (pthread_create(&threads[i], NULL, thread, &thread_chars_num) != 0) {
fprintf(stderr, "pthread_create failed!\n");
return EXIT_FAILURE;
}
}
I was thinking of giving to the thread function a struct with index to start reading and index to stop reading, but it's really confusing and I can't seem to find the right solution.
Assuming you have a struct like:
struct ft
{
char* file_name;
int start_index;
int end_index;
};
Then in your thread:
void *thread(void *arg) {
int i;
int c;
struct ft* fi = (struct ft*)arg;
FILE* file = fopen(fi->file_name);
fseek (file , fi->start_index, SEEK_SET);
for(i = 0; i < fi->end_index - fi->start_index; i++)
{
c = getc(file);
//do something
}
}
Also, don't forget to do pthread_join in your main thread, which will make it wait for the other threads to finish.
I am using array with 2 threads. One is writing to it and another reading. You can assume that reading is slower than writing. The reader and writer are separate pthreads. As far as I know sharing an array as a global variable between those threads is safe.
So overall picture is like:
char ** array;
void writer(void){
for (unsigned long i = 0; i < maxArraySize; i++){
array[i] = malloc(someSize);
array[i] = someCharArray;
}
void reader(void){
for (unsigned long i = 0; i < maxArraySize; i++){
if(array[i] == NULL){ // in case reader is faster
i--;
continue;
}
useData(array[i]);
free(array[i]); // HERE IS MY QUESTION..
}
main(){
array = malloc(maxArraySize);
pthread_t reader, writer;
pthread_create( &reader, NULL, reader, NULL);
pthread_create( &writer, NULL, writer, NULL);
}
My question is related with line where I free i'th element of array. Is it safe to do it? Because when I free i'th element, at the same time, write is writing to array. So can there be a case that writer gets wrong address as it can lose the head pointer?
No it is not safe if you read during a write without a special instruction the result is undefined. You could get any value, though it is unlikely that you will see any other than NULL or the one you had assigned.
As others have mentioned in the comments the un-initialized array may contain anything (it is undefined) though it is likely zeroed before the kernel gave it to you.
If you want safety you need a locking mechanism such as a semaphore (http://man7.org/linux/man-pages/man3/sem_init.3.html).
char ** array;
// Allows access while non zero
sem_t sem;
void writer(void){
for (unsigned long i = 0; i < maxArraySize; i++){
array[i] = malloc(someSize);
array[i] = someCharArray;
// Increment semaphore.
sem_post(&sem);
}
void reader(void){
for (unsigned long i = 0; i < maxArraySize; i++){
// Will return -1 if the semaphore is not at zero
// Will return 0 if semaphore is greater than zero and decrement it.
if(sem_trywait(&sem)){ // in case reader is faster
i--;
continue;
}
useData(array[i]);
free(array[i]); // HERE IS MY QUESTION..
}
main(){
// Initialize semaphore to zero
sem_init(&sem, 0 , 0);
// Initialize array to have maxArraySize elements.
array = malloc(maxArraySize * sizeof(*array));
pthread_t reader, writer;
pthread_create( &reader, NULL, reader, NULL);
pthread_create( &writer, NULL, writer, NULL);
}
This should be fast but will spin your cpu doing a lot of nothing at the sem_trywait. Use sem_wait if you can wait a little longer and do not need the spinning.
I also corrected the bug in your malloc statement because it was not allocating enough space for maxArraySize char * items.
I'm supposed to have two threads that search for the minimum element in an array: the first one searches the first half, and the second thread searches the other half. However, when I run my code, it seems that it chooses a thread randomly. I'm not sure what I'm doing wrong, but it probably has to do with the "mid" part. I tried dividing an array into two, finding the midpoint and then writing the conditions from there, but I probably went wrong somewhere. I also tried putting array[i] in the conditions, but in that case only thread2 executes.
EDIT: I'm really trying my best here, but I'm not getting anywhere. I edited the code in a way that made sense to me, and I probably typecasted "min" wrong but now it doesn't even execute it just gives me an error, even though it compiles just fine. I'm just a beginner, and while I do understand everything you guys are talking about, I have a hard time implementing the ideas, so really, any help with fixing this is appreciated!
EDIT2: Okay so the previous code made no sense at all, I do apologize but I was exhausted while writing it. Anyway, I came up with something else that works partially! I split the array into two halves, however only the first element is accessible when using the pointer. But would it work if the whole array was being accessed and if so how can I fix that then?
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#define size 20
void *smallest(void *arg);
pthread_t th, th2;
int array[size], i, min;
int main(int argc, char *argv[]) {
srand ( time(NULL) );
for(i = 0; i < size; i++)
{
array[i] = (rand() % 100)+1;
printf("%d ", array[i]);
}
int *array1 = malloc(10 * sizeof(int));
int *array2 = malloc(10 * sizeof(int));
memcpy(array1, array, 10 * sizeof(int));
memcpy(array2, array + 10, 10 * sizeof(int));
printf("\nFirst half gives %d \n", *array1);
printf("Second half gives %d \n", *array2);
pthread_create(&th, NULL, smallest, (void*) array1);
pthread_create(&th2, NULL, smallest, (void*) array2);
pthread_join(th, NULL);
pthread_join(th2, NULL);
//printf("\nFirst half gives %d\n", array1);
//printf("Second half gives %d\n", array2);
if (*array1 < *array2) {
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *array1);
}
else {
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *array2);
}
return 0;
}
void *smallest(void* arg){
int *array = (int*)arg;
min = array[0];
for (i = 0; i < size; i++) {
if (array[i] < min) {
min = array[i];
}
}
pthread_exit(NULL);
}
The code you've set up never runs more than one thread. Notice that if you run the first branch of the if statement, you fire off one thread to search half the array, wait for it to finish, then continue onward, and if the else branch executes, the same thing happens in the second half of the array. Fundamentally, you probably want to rethink your strategy here by having the code always launch two threads and join each of them only after both threads have started running.
The condition within your if statement also seems like it's mistaken. You're asking whether the middle element of the array is greater than its index. I assume this isn't what you're trying to do.
Finally, the code you have in each thread always looks at the entire array, not just a half of it. I would recommend rewriting the thread routine so that its argument represents the start and end indices of the range to take the minimum of. You would then update the code in main so that when you fire off the thread, you specify which range to search.
I would structure things like this:
Fire off a thread to find the minimum of the first half of the array.
Fire off a thread to find the minimum of the second half of the array.
Join both threads.
Use the results from each thread to find the minimum.
As one final note, since you'll have two different threads each running at the same time, you'll need to watch for data races as both threads try to read or write the minimum value. Consider having each thread use its exit code to signal where the minimum is and then resolving the true minimum back in main. This eliminates the race condition. Alternatively, have one global minimum value, but guard it with a mutex.
1) You´re redeclaring the global variables in the main function, so there´s actually no point in declaring i, low, high, min:
int array[size], i, low, high, min;
The problem you´re having is with the scope of the variables when you redeclare the variables in the main function, the global ones with the same name become "invisible"
int *low = array;
int *high = array + (size/2);
int mid = (*low + *high) / 2;
So when you run the threads all the values of your variables(low, high, min;
) are 0, this is because they are never actually modified by the main and because they start in 0 default(startup code,etc).
Anyways I wouldn´t really recommend(it´s really frowned upon) using global variables unless it´s a really small proyect for personal use.
2) Another crucial problem is that you´re ignorning the main idea behind threads which is running both simultaneously
if (array[mid] > mid) {
pthread_create(&th, NULL, &smallest, NULL);
pthread_join(th, NULL);
printf("\nThread1 finds the number\n");
}
else if (array[mid] < mid) {
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
printf("\nThread2 finds the number\n");
}
You´re actually only running one thread when executing.
Try something like this:
pthread_create(&th, NULL, &smallest, NULL);
pthread_create(&th2, NULL, &smallest, NULL);
pthread_join(th2, NULL);
pthread_join(th, NULL);
3) You are trying to have two threads access the same variable this can result in undefined behaviour, you MUST use a muthex to avoid a number from not actually being stored.
This guide is pretty complete regarding mutexes but if you need anyhelp please let me know.
This is a single threaded version of what you are asking.
#include <stdio.h>
#include <stdlib.h>
/*
I can not run pthread on my system.
So this is some code that should kind of work the same way
*/
typedef int pthread_t;
typedef int pthread_attr_t;
typedef void*(*threadfunc)(void*);
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void*), void *arg)
{
start_routine(arg);
return 0;
}
int pthread_join(pthread_t thread, void **value_ptr)
{
return 0;
}
struct context
{
int* begin;
int* end;
int* result;
};
//the function has to be castable to the threadfunction type
//that way you do not have to worry about casting the argument.
//be careful though - if something does not match these errors may be hard to track
void * smallest(context * c) //signature needet for start routine
{
c->result = c->begin;
for (int* current = c->begin; current < c->end; ++current)
{
if (*current < *c->result)
{
c->result = current;
}
}
return 0; // not needet with the way the argument is set up.
}
int main(int argc, char *argv[])
{
pthread_t t1, t2;
#define size 20
int array[size];
srand(0);
for (int i = 0; i < size; ++i)
{
array[i] = (rand() % 100) + 1;
printf("%d ", array[i]);
}
//prepare data
//one surefire way of messing up in multithreading is sharing data between threads.
//even a simple approach like storing in a variable who is accessing will not solve the issues
//to properly lock data you would have to dive into the memory model.
//either lock with mutexes or memory barriers or just don' t share data between threads.
context c1;
context c2;
c1.begin = array;
c1.end = array + (size / 2);
c2.begin = c1.end + 1;
c2.end = array + size;
//start threads - here your threads would go
//note the casting - you may wnt to wrap this in its own function
//there is error potential here, especially due to maintainance etc...
pthread_create(&t1, 0, (void*(*)(void*))smallest, &c1); //without typedef
pthread_create(&t2, 0, (threadfunc)smallest, &c2); //without typedef
pthread_join(t1, 0);//instead of zero you could have a return value here
pthread_join(t1, 0);//as far as i read 0 throws the return value away
//return value could be useful for error handling
//evaluate
if (*c1.result < *c2.result)
{
printf("\nThread1 finds the number\n");
printf("The smallest element is %i\n", *c1.result);
}
else
{
printf("\nThread2 finds the number\n");
printf("The smallest element is %i\n", *c2.result);
}
return 0;
}
Edit:
I edited some stubs in to give you an idea of how to use multithreading.
I never used pthread but this should likely work.
I used this source for prototype information.
I need to have buffers that I will use it in multiple different types of threads. So the array needs to be global.
Buffer size and number of buffers are given as input to the program.
As an alternative I could implement linked list maybe.
What is the best way to implement such buffers? Can you provide a sample?
Any help is appreciated!
I don't understand what do you mean by "without knowing length", if you pass size of each buffer and number of buffers as input parameters then you know every required length.
Maybe this is not the best, but that would be my way.
First declare global buffer and threads.
static void ** buffer;
pthread_t tid[2];
Here is described how the threads will work. First buffer will assign with data first two sub-buffers. Second will do the same with the other two.
void *assignBuffer(void *threadid) {
pthread_t id = pthread_self();
if (pthread_equal(id, tid[0])) {
strcpy(buffer[0], "foo");
strcpy(buffer[1], "bar");
} else {
strcpy(buffer[2], "oof");
strcpy(buffer[3], "rab");
}
return NULL;
}
Converting program args from string to integer.
Here we assign buffer with arrays of unknown type.
Here we assign each buffer with his size in bytes.
Finally we create working threads. The important thing is that they
will run simultaneously.
Waiting until all threads done their job.
Simple print buffer contents.
Ok, here is the code.
int main(int argc, char **argv) {
//1
int bufferSize = atoi(argv[1]);
int buffersAmount = atoi(argv[2]);
//2
buffer = malloc(sizeof(void *)*buffersAmount);
//3
int i;
for (i = 0; i < buffersAmount; ++i) {
buffer[i] = malloc(bufferSize);
}
//4
i = 0;
while (i < 2) {
pthread_create(&tid[i], NULL, &assignBuffer, NULL);
++i;
}
//5
for (i = 0; i < 2; i++)
pthread_join(tid[i], NULL);
//6
for (i = 0; i < 4; ++i) {
printf("%d %s\n", i, (char*)buffer[i]);
}
for (i = 0; i < buffersAmount; ++i) {
free(buffer[i]);
}
return 0;
}
Feel free to ask if you don't understand something, also sorry for my english it is not my native language.