Where is my message queue producing a segmentation fault? - c

The message queue simply stops working when dealing with many many threads. It only seems to work okay with 10 threads, for exmaple. GDB tells me
Program received signal SIGSEGV, Segmentation fault.
__GI_____strtol_l_internal (nptr=0x0, endptr=endptr#entry=0x0, base=base#entry=10, group=group#entry=0, loc=0x7ffff78b0060 <_nl_global_locale>)
at ../stdlib/strtol_l.c:298
298 ../stdlib/strtol_l.c: No such file or directory.
But I have no idea what this means. The same code on Windows works fine but on linux it doesn't, which confuses me more.
You can see below how this queue works. It is a singly linked list with locking while receiving messages. Please help me find where I messed up.
typedef struct Message {
unsigned type;
unsigned code;
void *data;
} Message;
typedef struct MessageQueueElement {
Message message;
struct MessageQueueElement *next;
} MessageQueueElement;
typedef struct MessageQueue {
MessageQueueElement *first;
MessageQueueElement *last;
} MessageQueue;
MessageQueue mq;
pthread_mutex_t emptyLock, sendLock;
pthread_cond_t emptyCond;
void init() {
mq.first = malloc(sizeof(MessageQueueElement));
mq.last = mq.first;
pthread_mutex_init(&emptyLock, NULL);
pthread_mutex_init(&sendLock, NULL);
pthread_cond_init(&emptyCond, NULL);
}
void clean() {
free(mq.first);
pthread_mutex_destroy(&emptyLock);
pthread_mutex_destroy(&sendLock);
pthread_cond_destroy(&emptyCond);
}
void sendMessage(MessageQueue *this, Message *message) {
pthread_mutex_lock(&sendLock);
if (this->first == this->last) {
pthread_mutex_lock(&emptyLock);
this->last->message = *message;
this->last = this->last->next = malloc(sizeof(MessageQueueElement));
pthread_cond_signal(&emptyCond);
pthread_mutex_unlock(&emptyLock);
} else {
this->last->message = *message;
this->last = this->last->next = malloc(sizeof(MessageQueueElement));
}
pthread_mutex_unlock(&sendLock);
}
int waitMessage(MessageQueue *this, int (*readMessage)(unsigned, unsigned, void *)) {
pthread_mutex_lock(&emptyLock);
if (this->first == this->last) {
pthread_cond_wait(&emptyCond, &emptyLock);
}
pthread_mutex_unlock(&emptyLock);
int n = readMessage(this->first->message.type, this->first->message.code, this->first->message.data);
MessageQueueElement *temp = this->first;
this->first = this->first->next;
free(temp);
return n;
}
some test code:
#define EXIT_MESSAGE 0
#define THREAD_MESSAGE 1
#define JUST_A_MESSAGE 2
#define EXIT 0
#define CONTINUE 1
int readMessage(unsigned type, unsigned code, void *data) {
if (type == THREAD_MESSAGE) {
printf("message from thread %d: %s\n", code, (char *)data);
free(data);
} else if (type == JUST_A_MESSAGE) {
puts((char *)data);
free(data);
} else if (type == EXIT_MESSAGE) {
puts("ending the program");
return EXIT;
}
return CONTINUE;
}
int nThreads;
int counter = 0;
void *worker(void *p) {
double pi = 0.0;
for (int i = 0; i < 1000000; i += 1) {
pi += (4.0 / (8.0 * i + 1.0) - 2.0 / (8.0 * i + 4.0) - 1.0 / (8.0 * i + 5.0) - 1.0 / (8.0 * i + 6.0)) / pow(16.0, i);
}
char *s = malloc(100);
sprintf(s, "pi equals %.8f", pi);
sendMessage(&mq, &(Message){.type = THREAD_MESSAGE, .code = (int)(intptr_t)p, .data = s});
counter += 1;
char *s2 = malloc(100);
sprintf(s2, "received %d message%s", counter, counter == 1 ? "" : "s");
sendMessage(&mq, &(Message){.type = JUST_A_MESSAGE, .data = s2});
if (counter == nThreads) {
sendMessage(&mq, &(Message){.type = EXIT_MESSAGE});
}
}
int main(int argc, char **argv) {
clock_t timer = clock();
init();
nThreads = atoi(argv[1]);
pthread_t threads[nThreads];
for (int i = 0; i < nThreads; i += 1) {
pthread_create(&threads[i], NULL, worker, (void *)(intptr_t)i);
}
while (waitMessage(&mq, readMessage));
for (int i = 0; i < nThreads; i += 1) {
pthread_join(threads[i], NULL);
}
clean();
timer = clock() - timer;
printf("%.2f\n", (double)timer / CLOCKS_PER_SEC);
return 0;
}
--- EDIT ---
Okay I managed to fix the problem by changing the program a bit using semaphores. The waitMessage function doesn't have to be locked since it is accessed by only one thread and the values that it modifies does not clash with sendMessage.
MessageQueue mq;
pthread_mutex_t mutex;
sem_t sem;
void init() {
mq.first = malloc(sizeof(MessageQueueElement));
mq.last = mq.first;
pthread_mutex_init(&mutex, NULL);
sem_init(&sem, 0, 0);
}
void clean() {
free(mq.first);
pthread_mutex_destroy(&mutex);
sem_destroy(&sem);
}
void sendMessage(MessageQueue *this, Message *message) {
pthread_mutex_lock(&mutex);
this->last->message = *message;
this->last = this->last->next = malloc(sizeof(MessageQueueElement));
pthread_mutex_unlock(&mutex);
sem_post(&sem);
}
int waitMessage(MessageQueue *this, int (*readMessage)(unsigned, unsigned, void *)) {
sem_wait(&sem);
int n = readMessage(this->first->message.type, this->first->message.code, this->first->message.data);
MessageQueueElement *temp = this->first;
this->first = this->first->next;
free(temp);
return n;
}

Your waitMessage function is modifying this->first outside of any locking. This is a bad thing.
It's often not worth recreating things that are already provided for you by an OS. You're effectively trying to set up a pipe of Message structures. You could simply use an anonymous pipe instead (see here for Linux, or here for Windows) and write/read Message structures to/from it. There's also POSIX message queues which are probably a bit more efficient.
In your case with multiple worker threads you'd have to have a supplementary mutex semaphore to control which worker is trying to read from the pipe or message queue.

Related

C: Multi producer, multi consumer bounded queue

I try (better tried) to implement a circular buffer with the following interface:
ring_buffer *ring_buffer_create(int capacity, int element_size);
void ring_buffer_destroy(ring_buffer *buffer)
const void *ring_buffer_read_acquire(ring_buffer *buffer, ring_buffer_loc *loc);
void ring_buffer_read_finish(ring_buffer *buffer, ring_buffer_loc loc);
void *ring_buffer_write_acquire(ring_buffer *buffer, ring_buffer_loc *loc);
void ring_buffer_write_finish(ring_buffer *buffer, ring_buffer_loc loc);
It should be possible to read / write multiple elements concurrently (and even in parallel). E.g.:
ring_buffer *buffer = ring_buffer_create(10, sizeof(int));
/* Write a single element */
ring_buffer_loc loc0;
int *i0 = ring_buffer_write_acquire(buffer, &loc);
*i0 = 42; // this could be a big data structure and way more expensive
ring_buffer_write_finish(buffer, loc0);
/* Write "concurrently" */
ring_buffer_loc loc1, loc2;
int *i1 = ring_buffer_write_acquire(buffer, &loc);
int *i2 = ring_buffer_write_acquire(buffer, &loc);
*i1 = 1729;
*i2 = 314;
ring_buffer_write_finish(buffer, loc1);
ring_buffer_write_finish(buffer, loc2);
All "acquire"-functions should be blocking until the operation is possible.
So far, so good. I thought this is simple and so I started with a clean implementation which is based on mutex. But soon I could see that this was far too slow for my use-case (100'000 writes and reads per second), so I switched over to spin-locks etc.
My implementation became quite messy and at some point (now), I started to think about why not something "simple" like this with the desired interface already exists? Probably, it is anyway not a great idea to re-implement something like this.
Maybe someone knows an implementation which has such an interface and which is blocking if the operation is not possible? I was looking quite long in the internet, but I could not find a good match for my problem. Maybe my desired interface is just "bad" or "wrong"?
Nevertheless, I add my current code. It basically assigns each "cell" (=value) a state which can be NONE (not set; the cell is basically empty), WRITING (someone acquired the cell to write data), READING (someone acquired the cell to read) and SET (the cell has a value which could be read). Each cell has a spin-lock which is used to update the cell state.
It then works like this:
When someone acquires a read and the current cell has the state "SET", then the value can be read (new state is READING) and the read index is increased. In all other cases a conditional variable is used to wait until an element is available. When an element read is finished, the cell state is changed to NONE and if any writers are waiting, a conditional variable signal is sent.
The same is true if a cell write is acquires. The only difference is that the cell needs the state "NONE" to be used and possible readers are signaled if there are any.
For some reasons the code sometimes locks and so I had to add a "dirty" timeout to my conditional variable. I would already be super happy if this could be solved, because the "timeout" basically makes the code polling (which is relatively ugly) and at the same time many context switches are done. Maybe someone sees the bug? The "new" code also has the disadvantage that it sometimes is really slow which is like a killer for my application. I attached the "old" and the "new" code (the changed lines are marked).
Thank you for helping me:)!
#include <stdio.h>
#include <stdlib.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <assert.h>
#include <pthread.h>
#include <errno.h>
#include <unistd.h>
typedef int ring_buffer_loc;
enum t_ring_buffer_cell_state
{
NONE = 0,
WRITING = 1,
READING = 2,
SET = 3
};
typedef struct {
char *buffer; // data
atomic_int_fast8_t *states; // state per cell
pthread_spinlock_t *locks; // lock per cell
int capacity;
int element_size;
pthread_spinlock_t read_i_lock;
int read_i;
pthread_spinlock_t write_i_lock;
int write_i;
pthread_spinlock_t waiting_readers_lock;
int waiting_readers;
pthread_spinlock_t waiting_writers_lock;
int waiting_writers;
pthread_mutex_t value_written_lock;
pthread_mutex_t value_read_lock;
pthread_cond_t value_written;
pthread_cond_t value_read;
} ring_buffer;
ring_buffer *ring_buffer_create(int capacity, int element_size)
{
ring_buffer *res = calloc(1, sizeof(ring_buffer));
res->buffer = calloc(capacity, element_size);
res->states = calloc(capacity, sizeof(*res->states));
res->locks = malloc(capacity * sizeof(*res->locks));
for (int i = 0; i < capacity; ++i) {
pthread_spin_init(&res->locks[i], PTHREAD_PROCESS_PRIVATE);
}
pthread_spin_init(&res->write_i_lock, PTHREAD_PROCESS_PRIVATE);
pthread_spin_init(&res->read_i_lock, PTHREAD_PROCESS_PRIVATE);
pthread_spin_init(&res->waiting_readers_lock, PTHREAD_PROCESS_PRIVATE);
pthread_spin_init(&res->waiting_writers_lock, PTHREAD_PROCESS_PRIVATE);
res->capacity = capacity;
res->element_size = element_size;
return res;
}
void ring_buffer_destroy(ring_buffer *buffer)
{
free(buffer->buffer);
free(buffer->states);
free(buffer);
}
static inline void ring_buffer_inc_index(ring_buffer *buffer, int *index)
{
*index = (*index + 1) % buffer->capacity;
}
void timespec_now_plus_ms(struct timespec *result, long ms_to_add)
{
const int one_second_us = 1000 * 1000 * 1000;
timespec_get(result, TIME_UTC);
const long nsec = result->tv_nsec + ms_to_add * 1000 * 1000;
result->tv_sec += nsec / one_second_us;
result->tv_nsec += nsec % one_second_us;
}
const void *ring_buffer_read_acquire(ring_buffer *buffer, ring_buffer_loc *loc)
{
bool is_waiting = false;
start:
pthread_spin_lock(&buffer->read_i_lock);
const int read_i = buffer->read_i;
pthread_spinlock_t *cell_lock = &buffer->locks[read_i];
pthread_spin_lock(cell_lock);
const int state = buffer->states[read_i];
if (state == NONE || state == WRITING || state == READING) {
if (!is_waiting) {
is_waiting = true;
pthread_spin_lock(&buffer->waiting_readers_lock);
++buffer->waiting_readers;
pthread_mutex_lock(&buffer->value_written_lock);
pthread_spin_unlock(&buffer->waiting_readers_lock);
} else {
pthread_mutex_lock(&buffer->value_written_lock);
}
pthread_spin_unlock(cell_lock);
pthread_spin_unlock(&buffer->read_i_lock);
// "new" code:
// struct timespec ts;
// do {
// timespec_now_plus_ms(&ts, 50);
// } while (pthread_cond_timedwait(&buffer->value_written, &buffer->value_written_lock, &ts) == ETIMEDOUT && buffer->states[read_i] == state);
// pthread_mutex_unlock(&buffer->value_written_lock);
// "old" code (which hangs quite often):
pthread_cond_wait(&buffer->value_written, &buffer->value_written_lock);
pthread_mutex_unlock(&buffer->value_written_lock);
goto start;
} else if (state == SET) {
if (is_waiting) {
pthread_spin_lock(&buffer->waiting_readers_lock);
--buffer->waiting_readers;
assert(buffer->waiting_readers >= 0);
pthread_spin_unlock(&buffer->waiting_readers_lock);
}
buffer->states[read_i] = READING;
ring_buffer_inc_index(buffer, &buffer->read_i);
pthread_spin_unlock(&buffer->read_i_lock);
pthread_spin_unlock(cell_lock);
*loc = read_i;
return &buffer->buffer[read_i * buffer->element_size];
} else {
printf("unknown state!\n");
exit(1);
}
}
void ring_buffer_read_finish(ring_buffer *buffer, ring_buffer_loc loc)
{
pthread_spinlock_t *cell_lock = &buffer->locks[loc];
pthread_spin_lock(cell_lock);
buffer->states[loc] = NONE;
pthread_spin_unlock(cell_lock);
pthread_spin_lock(&buffer->waiting_writers_lock);
if (buffer->waiting_writers > 0) {
pthread_cond_signal(&buffer->value_read);
}
pthread_spin_unlock(&buffer->waiting_writers_lock);
}
void *ring_buffer_write_acquire(ring_buffer *buffer, ring_buffer_loc *loc)
{
bool is_waiting = false;
start:
pthread_spin_lock(&buffer->write_i_lock);
const int write_i = buffer->write_i;
pthread_spinlock_t *cell_lock = &buffer->locks[write_i];
pthread_spin_lock(cell_lock);
const int state = buffer->states[write_i];
if (state == SET || state == READING || state == WRITING) {
if (!is_waiting) {
is_waiting = true;
pthread_spin_lock(&buffer->waiting_writers_lock);
++buffer->waiting_writers;
pthread_mutex_lock(&buffer->value_read_lock);
pthread_spin_unlock(&buffer->waiting_writers_lock);
} else {
pthread_mutex_lock(&buffer->value_read_lock);
}
pthread_spin_unlock(cell_lock);
pthread_spin_unlock(&buffer->write_i_lock);
// "new" code:
// struct timespec ts;
// do {
// timespec_now_plus_ms(&ts, 5);
// } while (pthread_cond_timedwait(&buffer->value_read, &buffer->value_read_lock, &ts) == ETIMEDOUT && buffer->states[write_i] == state);
// pthread_mutex_unlock(&buffer->value_read_lock);
// "old" code (which hangs quite often):
pthread_cond_wait(&buffer->value_read, &buffer->value_read_lock);
pthread_mutex_unlock(&buffer->value_read_lock);
goto start;
} else if (state == NONE) {
if (is_waiting) {
pthread_spin_lock(&buffer->waiting_writers_lock);
--buffer->waiting_writers;
assert(buffer->waiting_writers >= 0);
pthread_spin_unlock(&buffer->waiting_writers_lock);
}
buffer->states[write_i] = WRITING;
ring_buffer_inc_index(buffer, &buffer->write_i);
pthread_spin_unlock(&buffer->write_i_lock);
pthread_spin_unlock(cell_lock);
*loc = write_i;
return &buffer->buffer[write_i * buffer->element_size];
} else {
printf("unknown state!\n");
exit(1);
}
}
void ring_buffer_write_finish(ring_buffer *buffer, ring_buffer_loc loc)
{
pthread_spinlock_t *cell_lock = &buffer->locks[loc];
pthread_spin_lock(cell_lock);
buffer->states[loc] = SET;
pthread_spin_unlock(cell_lock);
pthread_spin_lock(&buffer->waiting_readers_lock);
if (buffer->waiting_readers > 0) {
pthread_cond_signal(&buffer->value_written);
}
pthread_spin_unlock(&buffer->waiting_readers_lock);
}
/* just for debugging */
void ring_buffer_dump(const ring_buffer *buffer)
{
printf("RingBuffer\n");
printf(" Capacity: %d\n", buffer->capacity);
printf(" Element size: %d\n", buffer->element_size);
printf(" Read index: %d\n", buffer->read_i);
printf(" Write index: %d\n", buffer->write_i);
printf(" Cells:\n");
for (int i = 0; i < buffer->capacity; ++i) {
printf(" [%d]: STATE = ", i);
switch (buffer->states[i]) {
case NONE:
printf("NONE");
break;
case WRITING:
printf("WRITING");
break;
case READING:
printf("READING");
break;
case SET:
printf("SET");
break;
}
printf("\n");
}
printf("\n");
}
/*
* Test run
*/
struct write_read_n_conf {
ring_buffer *buffer;
int n;
};
static void *producer_thread(void *arg)
{
struct write_read_n_conf conf = *(struct write_read_n_conf *)arg;
for (int i = 0; i < conf.n; ++i) {
ring_buffer_loc loc;
int *value = ring_buffer_write_acquire(conf.buffer, &loc);
*value = i;
ring_buffer_write_finish(conf.buffer, loc);
if (i % 1000 == 0) {
printf("%d / %d\n", i, conf.n);
}
}
return NULL;
}
static void *consumer_thread(void *arg)
{
struct write_read_n_conf conf = *(struct write_read_n_conf *)arg;
int tmp;
bool ok = true;
for (int i = 0; i < conf.n; ++i) {
ring_buffer_loc loc;
const int *value = ring_buffer_read_acquire(conf.buffer, &loc);
tmp = *value;
ring_buffer_read_finish(conf.buffer, loc);
ok = ok && (tmp == i);
}
printf("ok = %d\n", ok);
return (void *)ok;
}
void write_read_n_parallel(int n)
{
ring_buffer *buffer = ring_buffer_create(50, sizeof(int));
struct write_read_n_conf conf = {
.buffer = buffer,
.n = n
};
pthread_t consumer;
pthread_t producer;
pthread_create(&consumer, NULL, consumer_thread, &conf);
pthread_create(&producer, NULL, producer_thread, &conf);
pthread_join(producer, NULL);
void *res;
pthread_join(consumer, &res); // hacky way to pass a bool: res == NULL means false, and otherwise true
assert(res != NULL);
}
int main() {
write_read_n_parallel(10000000);
}

malloc() is returning the same address multiple times, even when I haven't used free()

EDIT: I did use free(), ignore the title.
The gist is that every time malloc() is called, the address 0x8403620
is returned, which I found out using Gdb.
tellers[i] = create_teller(0, i, NULL);
I first use malloc() on line 72 to create 3 teller structures. The first addressed returned, visible through Gdb, is 0x84003620. The second is
0x84033a0, the third 0x84034e0. Everything seems fine.
clients[i] = create_client(0, i, -1, -1);
Then I use malloc() on line 77 with the create_client() function to
create 100 clients. The first address, assigned to client[0], is ...
0x8403620. The same as tellers[0]. It gets worse. The next address
returned from malloc() is 0x8403620 again for when i = 1, and so
on for i = 3, 4, ..., 99.
It isn't inherently the create_client() or the create_teller() functions, but
instead the malloc() function itself.
This is simply a very odd situation.
Now, I'd like to ask: Am I using malloc() wrong? Or is my version of malloc() bugged and should I somehow reinstall whatever it is? It's most likely my code since it works for creating the tellers, just not for the clients.
Here is the full code:
#include <pthread.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <time.h>
#include <assert.h>
typedef struct teller teller_t;
typedef struct client client_t;
teller_t * create_teller (pthread_t thread_id, int id, client_t *assigned_client);
client_t * create_client (pthread_t thread_id, int id, int operation, int amount);
void * run_teller (void *arg);
void * run_client (void *arg);
/* types of operations */
#define DEPOSIT 0
#define WITHDRAW 1
#define NUM_TELLERS 3
#define NUM_CLIENTS 100
struct client {
pthread_t thread_id;
int id;
int operation;
int amount;
};
struct teller {
pthread_t thread_id;
int id;
bool available;
client_t *assigned_client;
};
client_t *clients[100];
teller_t *tellers[3];
/* only 2 tellers at a time can access */
sem_t safe;
/* only 1 teller at a time can access */
sem_t manager;
/* amount of tellers available, at most 3 */
sem_t line; /* rename to available? */
/* each teller waiting for a client to be assigned to them */
sem_t wait_for_client[3];
int
main (int argc, char **argv) {
(void) argc;
(void) argv;
srand(time(NULL));
/* This also tells us how many clients have been served */
int client_index = 0;
sem_init(&safe, 0, 2);
sem_init(&manager, 0, 1);
sem_init(&line, 0, 0);
for (int i = 0; i < 3; i++)
sem_init(&wait_for_client[i], 0, 0);
for (int i = 0; i < NUM_TELLERS; i++) {
tellers[i] = create_teller(0, i, NULL);
pthread_create(&tellers[i]->thread_id, NULL, run_teller, (void *) tellers[i]);
}
for (int i = 0; i < NUM_CLIENTS; i++) {
clients[i] = create_client(0, i, -1, -1);
pthread_create(&clients[i]->thread_id, NULL, run_client, (void *) clients[i]);
}
/* DEBUG
for (int i = 0; i < NUM_CLIENTS; i++) {
printf("client %d has id %d\n", i, clients[i]->id);
}
*/
// No threads should get past this point!!!
// ==------------------------------------==
// Should all of this below be handled by the clients instead of main?
while (1) {
if (client_index >= NUM_CLIENTS) {
// TODO:
// tell tellers that there are no more clients
// so they should close, then then close the bank.
break;
}
sem_wait(&line);
for (int i = 0; i < 3; i++) {
if (tellers[i]->available) {
int client_id = clients[client_index]->id;
//printf("client_index = %d\n", client_index); // DEBUG
tellers[i]->assigned_client = clients[client_index++];
tellers[i]->available = false;
printf(
"Client %d goes to Teller %d\n",
client_id,
tellers[i]->id
);
sem_post(&wait_for_client[i]);
break;
}
}
//sem_post(&line); // Is this needed?
}
return EXIT_SUCCESS;
}
teller_t *
create_teller (pthread_t thread_id, int id, client_t *assigned_client) {
teller_t *t = (teller_t *) malloc(sizeof(teller_t));
if (t == NULL) {
printf("ERROR: Unable to allocate teller_t.\n");
exit(EXIT_FAILURE);
}
t->thread_id = thread_id;
t->id = id;
t->available = true;
t->assigned_client = assigned_client;
return t;
}
/* TODO: Malloc returns the same address everytime, fix this */
client_t *
create_client (pthread_t thread_id, int id, int operation, int amount) {
client_t *c = malloc(sizeof(client_t));
if (c == NULL) {
printf("ERROR: Unable to allocate client_t.\n");
exit(EXIT_FAILURE);
}
c->thread_id = thread_id;
c->id = id;
c->operation = operation;
c->amount = amount;
return c;
}
void *
run_teller (void *arg) {
teller_t *t = (teller_t *) arg;
printf("Teller %d is available\n", t->id);
while (1) {
/* tell the line that a teller is available */
sem_post(&line);
/* pass when the line assignes a client to this teller */
sem_wait(&wait_for_client[t->id]);
assert(t->assigned_client != NULL);
if (t->assigned_client->operation == WITHDRAW) {
}
else {
}
}
free(arg);
pthread_cancel(t->thread_id);
return NULL;
}
void *
run_client (void *arg) {
client_t *c = (client_t *) arg;
c->operation = rand() & 1;
printf(
"Client %d waits in line to make a %s\n",
c->id,
((c->operation == DEPOSIT) ? "Deposit" : "Withdraw")
);
free(arg);
pthread_cancel(c->thread_id);
return NULL;
}
Then I use malloc() on line 77 with the create_client() function to create 100 clients.
Not exactly, you create one object, then you spawn a thread that manages that object, run_client() and then repeat. But run_client() basically does nothing except free() your client object! So malloc is totally right returning the same address again, as it is now free memory.
It just happens that your client threads are faster than your main one. Your problem here is that you are freeing the objects from secondary threads while leaving the dangling pointers in the global pointer array. If you use that array for debugging purposes, then nothing is actually wrong here, but if you want to use the client objects somewhen in the future, then you should not free your clients in the first place.

Segmentation Fault when dealing with large array size

I've been trying to use Pthreads in a program to count the 3's in a 3000000 element int array, and when it runs sequentially without using pthreads, it works perfectly.
Using pthreads leads to a segmentation fault and i can't figure out why. the execution stops when each thread reaches around 300000 iteration in case of 4 threads on a 8gb of RAM and 256K L2 cache.
Here is my code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <pthread.h>
#define LENGTH 3000000
#define NUM_THREADS 4
int countarr[128];
int* array;
struct Op_data{
int start_index;
int count_index;
int CHUNK;
int ID;
};
void* count3s(void* data) // you need to parallelize this
{
struct Op_data *info;
info = (struct Op_data *) data;
int count_i = info -> count_index;
int i = info->start_index;
int CHUNK = info -> CHUNK;
printf("Thread data:\nCOUNT_INDEX:\t\t%d\nSTART_INDEX:\t\t%d\nCHUNK_SIZE:\t\t%d\n",count_i,i,CHUNK);
int c = 0;
struct timeval t1, t2;
gettimeofday(&t1, NULL);
for(i;i<i+CHUNK;i++)
{
if(array[i]==3)
{
c++;
}
}
countarr[count_i] = c;
gettimeofday(&t2, NULL);
double t = (t2.tv_sec - t1.tv_sec) + (t2.tv_usec - t1.tv_usec ) / 1000000.0;
printf("execution time = %f seconds\n",t);
}
int main(int argc, char * argv[])
{
int i = 0;
int ok = 0;
int count=0;
array = malloc(LENGTH * sizeof(int));
// initialize the array randomly. Make sure the number of 3's doesn't exceed 500000
srand(12);
for(i= 0;i<LENGTH;i++)
{
array[i]=rand()%10;
if(array[i]==3)
{
ok++; // keep track of how many 3's are there, we will use this later to confirm the correctness of the code
if(ok>500000)
{
ok=500000;
array[i]=12;
}
}
}
pthread_t threads[NUM_THREADS];
struct Op_data* t_data[NUM_THREADS];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
int rc;
int CHUNK = LENGTH/NUM_THREADS;
for(int t=0;t<NUM_THREADS;t++)
{
t_data[t] = (struct Op_data*) malloc(sizeof(struct Op_data));
t_data[t] -> start_index = t*CHUNK;
t_data[t] -> count_index = t*(128/NUM_THREADS);
t_data[t] -> CHUNK = CHUNK;
t_data[t] -> ID = t;
rc = pthread_create(&threads[t], &attr, count3s, (void *)t_data[t]);
if (rc) {
printf("Error:unable to create thread,%d\n",rc);
exit(-1);
}
printf("Thread (%d) has been created.\n",t);
}
for( int g = 0; g < NUM_THREADS; g++ ) {
rc = pthread_join(threads[g], NULL);
if (rc) {
printf("Error:unable to join,%d\n",rc);
exit(-1);
}
}
pthread_attr_destroy(&attr);
for(int x=0;x<NUM_THREADS;x++)
{
count += countarr[x*(128/NUM_THREADS)];
}
if( ok == count ) // check if the result is correct
{
printf("Correct Result!\n");
printf("Number of 3`s: %d\n_________________________________\n",count);
}
else
{
printf("Wrong Result! Your count:%d\n",count);
printf("The correct number of 3`s is: %d\n",ok);
}
pthread_exit(NULL);
return 0;
}
for(i;i<i+CHUNK;i++)
Since i will always be less than i + CHUNK (unless it overflows), this loop will overrun the bounds of the array.

why are Threads loading data from array randomly?

I'm new with threads and I’m trying to create a program where four threads do some parallel computations using values from a global array. But, The problem that the threads are not loading the data in order.
#define QUARTER 64
#define RANGE_STEP 256
struct thread_data
{
unsigned start;
unsigned stop;
__m256* re_fc;
__m256* im_fc;
};
#define NUM_THREADS 4
struct thread_data thread_data_array[NUM_THREADS];
void *routine(void *thread_info)
{
int n,t;
unsigned t_start,t_stop;
__m256 *Re_fac , *Im_fac;
struct thread_data *mydata;
mydata = (struct thread_data*) thread_info;
t_start = mydata->start;
t_stop = mydata->stop;
Re_fac = mydata->re_fc;
Im_fac = mydata->im_fc;
t = t_start;
for (n = t_start; n < t_stop; n += 8)
{
// computations
RE_m256_fac = Re_fac[t];
IM_m256_fac = Im_fac[t];
// computations
t++;
}
pthread_exit(NULL);
}
int main()
{
unsigned t,i=0;
for(t=0;t<RANGE_STEP;t+=QUARTER)
{
thread_data_array[i].start = t;
thread_data_array[i].stop = t+QUARTER;
thread_data_array[i].re_fc = RE_factors;
thread_data_array[i].im_fc = IM_factors;
pthread_create(&threads[i],NULL,routine,(void *)&thread_data_array[i]);
i++;
}
for(i=0; i<NUM_THREADS; i++)
{
int rc = pthread_join(threads[i], NULL);
if (rc)
{
fprintf(stderr, "failed to join thread #%u - %s\n",i, strerror(rc));
}
}
}
The problem that I am talking about is occurring in the routine of the thread inside the for() loop exactly with these two load instructions RE_m256_fac = Re_fac[t];and IM_m256_fac = Im_fac[t]; the loaded data is not correct ... I think the index t is a local variable so no synchros is needed , or I am wrong?
After some digging it turned to be that since I am reading from a global shared array I have to use the mutex mechanism to prevent the mutual exclusion as so:
void *routine(void *thread_info)
{
int n,t;
unsigned t_start,t_stop;
__m256 *Re_fac , *Im_fac;
struct thread_data *mydata;
mydata = (struct thread_data*) thread_info;
t_start = mydata->start;
t_stop = mydata->stop;
Re_fac = mydata->re_fc;
Im_fac = mydata->im_fc;
t = t_start;
for (n = t_start; n < t_stop; n += 8)
{
pthread_mutex_lock(&mutex);
// computations
RE_m256_fac = Re_fac[t];
IM_m256_fac = Im_fac[t];
// computations
pthread_mutex_unlock(&mutex);
t++;
}
pthread_exit(NULL);
}
and since then, I can see that the threads are loading the values correctly from the shared array.

Segmentation fault: core dumped during execution of multi-threaded program

I have realized that my code was too lengthy and rather hard to read.
Can you check over the way I pass in the arguments and constructing the arguments in the main body?
Essentially, provided that I have correct implementation of "produce" and "consume" functions, I want to pass in a shared circular queue and semaphores and mutexes to each produce/consume threads.
typedef struct circularQueue
{
int *items;
int *head;
int *tail;
int numProduced;
int numConsumed;
} circularQueue;
typedef struct threadArg
{
int id;
circularQueue *queue;
pthread_mutex_t *mutex;
sem_t *spaces;
sem_t *itemAvail;
int numItems;
int bufferSize;
int numProducer;
int numConsumer;
} threadArg;
pthread_t *producerThd;
pthread_t *consumerThd;
int main(int argc, char* argv[])
{
pthread_attr_t attr;
// In fo to pass to thread arg
circularQueue *myQueue;
pthread_mutex_t useSharedMem;
sem_t spaces;
sem_t itemAvail;
int numItems;
int bufferSize;
int numProducer;
int numConsumer;
int i, j, k, l;
if(argc != 5)
{
printf("Enter in 4 arguments - N B P C\n");
return -1;
}
numItems = atoi(argv[1]);
bufferSize = atoi(argv[2]);
numProducer = atoi(argv[3]);
numConsumer = atoi(argv[4]);
if(numItems == 0 || bufferSize == 0 || numProducer == 0 || numConsumer == 0)
{
printf("Parameters should not be 0\n");
return -1;
}
// Initialize list of threads
producerThd = malloc(sizeof(pthread_t) * numProducer);
consumerThd = malloc(sizeof(pthread_t) * numConsumer);
// Initialize semaphores
sem_init(&spaces, 0, bufferSize);
sem_init(&itemAvail, 0, 0);
// Initialize mutex
pthread_mutex_init(&useSharedMem, NULL);
// Initialzie thread attributes
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// Initialize queue
myQueue = (circularQueue*)malloc(sizeof(circularQueue));
myQueue->items = (int*)malloc(sizeof(int)*bufferSize);
myQueue->head = myQueue->items;
myQueue->tail = myQueue->items;
myQueue->numProduced = 0;
myQueue->numConsumed = 0;
// thread arguments
for(i = 0; i < numProducer; i++)
{
// Initialize thraed args
threadArg *args = (threadArg*)malloc(sizeof(threadArg));
args->queue = (circularQueue*)malloc(sizeof(circularQueue));
args->mutex = &useSharedMem;
args->spaces = &spaces;
args->itemAvail = &itemAvail;
args->numItems = numItems;
args->bufferSize = bufferSize;
args->numProducer = numProducer;
args->numConsumer = numConsumer;
args->id = i;
pthread_t thisThread = *(producerThd + i);
pthread_create(&thisThread, &attr, produce, args);
}
for(j = 0; j < numConsumer; j++)
{
// Initialize thraed args
threadArg *args = (threadArg*)malloc(sizeof(threadArg));
args->queue = (circularQueue*)malloc(sizeof(circularQueue));
args->mutex = &useSharedMem;
args->spaces = &spaces;
args->itemAvail = &itemAvail;
args->numItems = numItems;
args->bufferSize = bufferSize;
args->numProducer = numProducer;
args->numConsumer = numConsumer;
args->id = j;
pthread_t thisThread = *(consumerThd + i);
pthread_create(&thisThread, &attr, consume, args);
}
for(k = 0; k < numProducer; k++)
{
pthread_join(*(producerThd+k), NULL);
}
printf("Finished waiting for producers\n");
for(l = 0; l < numConsumer; l++)
{
pthread_join(*(consumerThd+l), NULL);
}
printf("Finished waiting for consumers\n");
free(producerThd);
free(consumerThd);
free(myQueue->items);
free(myQueue);
sem_destroy(&spaces);
sem_destroy(&itemAvail);
fflush(stdout);
return 0;
}
Thank you
There are multiple sources of undefined behavior in your code, you are either compiling without enabling compilation warnings, or what I consider worst you ignore them.
You have the wrong printf() specifier in
printf("cid %d found this item %d as valid item %d\n", myArgs->id, thisItem, validItem);
because validItem is a double, so the last specifier should be %f.
Your thread functions never return a value, but you declare them to return void * which is the signature required for such functions.
You are freeing and dereferencing myQueue in the main() function but you have not initialized it because that code is commented.
Your code is also too hard to read because you have no consistent style and you mix declarations with statements, which make everything very confusing, e.g. determining the scope of a variable is very difficult.
Fixing the code will not only help others read it, but will also help you fix it and find issues quickly.

Resources