Out of memory in a thread pool in C, Linux - c

I need to create infinite loop and with a thread pool create for example 200 threads to do the job from infinite loop.
I'm using this thread pool - https://github.com/Pithikos/C-Thread-Pool
In the same time I'm monitoring the server resources (with htop) and see that memory is increasing on 3 megabytes every second until the kernel kills the application.
The code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include "thpool.h"
#define MAX_IPv4 256
/* Args for thread start function */
typedef struct {
int octet1;
int octet2;
int octet3;
int octet4;
} args_struct;
/* Thread task */
void task1(void *args) {
args_struct *actual_args = args;
printf("%d.%d.%d.%d\n", actual_args->octet1, actual_args->octet2, actual_args->octet3, actual_args->octet4);
/* Do some job */
sleep(1);
/* Free the args */
free(args);
}
/* Main function */
int main( void ) {
int i=0, j=0, n=0, m=0;
/* Making threadpool n threads */
threadpool thpool = thpool_init(200);
/* Infinite loop start from the certain ip*/
while (1) {
for (i=0; i < MAX_IPv4; ++i) {
for (j=0; j < MAX_IPv4; ++j) {
for (n=0; n < MAX_IPv4; ++n) {
for (m=0; m < MAX_IPv4; ++m) {
/* Heap memory for the args different for the every thread */
args_struct *args = malloc(sizeof *args);
args->octet1 = i;
args->octet2 = j;
args->octet3 = n;
args->octet4 = m;
/* Create thread */
thpool_add_work(thpool, (void*)task1, (void*)args);
}
}
}
}
/* Start from 0.0.0.0 */
i=0;
j=0;
n=0;
m=0;
}
/* Wait until the all threads are done */
thpool_wait(thpool);
/* Destroy the threadpool */
thpool_destroy(thpool);
return 0;
}
How to solve this issue?

Looking at issues for your library ( especially this one about memory consumption ).
There is a recommendation to check the job queue length threadpool.jobqueue.len;
Maybe your code could check after adding your job to the queue
Unfortunately the threadpool is an opaque pointer and you could not access the value directly.
I would recommend adding a function for the threadpool in thpool.c :
int thpool_jobqueue_length(thpool_* thpool_p) {
return thpool->jobqueue->len;
}
And don't forget the declaration in thpool.h
int thpool_jobqueue_length(threadpool);
Then modify your code
const int SOME_ARBITRARY_VALUE = 400
...
thpool_add_work(thpool, (void*)task1, (void*)args);
while( ( thpool_jobqueue_length(thpool) > SOME_ARBITRARY_VALUE ) ) {
sleep(1);
}
...

Looking at the code for thpool_add_work there is some memory use per call (allocating a job record to add to a queue), so as your loop runs forever, it is not surprising that it will run out of memory at some point. You are also allocating memory inside your innermost loop, so that too will help use up all your memory.
Essentially inside your inner loop you are allocating 16 bytes (assuming int is 4) for the args_struct, and thpool_add_work is also allocating 12 bytes (possibly rounded to 16 for alignment purposes).
As you can imagine, that adds up to a lot for your 4 nested loops (which are also run infinitely).

Related

Trying to understand Race Conditions/Threads in C

For staters, I am a student who wasn't a CS undergrad, but am moving into a CS masters. So I welcome any and all help anyone is willing to give.
The purpose of this was to create N threads between 2-4, then using a randomly generated array of lower case characters, make them uppercase.
This needed to be done using the N threads (defined by the command line when executed), dividing the work up as evenly as possible, using pthread.
My main question I'm trying to ask, is if I avoided race conditions between my threads?
I am also struggling to understand dividing the work among the threads. As I understand (correct me if I'm wrong), in general the threads functioning will be chosen at random during execution. So, I'm assuming I need to do something along the lines of dynamically dividing the array among the N number of threads and setting it so that each thread will perform the uppercasing of a same sized subsection of the array?
I know there are likely a number of other discrepancies I need to get better at within my code, but I haven't coded long and just started using C/C++ about a month ago.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
#include <ctype.h>
//Global variable for threads
char randChars[60];
int j=0;
//Used to avoid race conditions
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
//Establish the threads
void* upperThread(void* argp)
{
while(randChars[j])
{
pthread_mutex_lock( &mutex1 );
putchar (toupper(randChars[j]));
j++;
pthread_mutex_unlock( &mutex1 );
}
return NULL;
}
int main(int argc, char **argv)
{
//Initializae variables and thread
int N,randNum,t;
long i;
pthread_t pth[N];
pthread_mutex_init(&mutex1, NULL);
char randChar = ' ';
//Check number of command inputs given
if(argc!=2)
{
fprintf(stderr,"usage: %s <enter a value for N>\n", argv[0]);
exit(0);
}
N = atoi(argv[1]);
//Checks command inputs for correct values
if(N<2||N>4){
printf("Please input a value between 2 and 4 for the number of threads.\n");
exit(0);
}
//Seed random to create a randomized value
srand(time(NULL));
printf("original lower case version:\n");
for (i=0; i<61; i++)
{
//Generate a random integer in lower alphabetical range
randNum = rand()%26;
randNum = randNum+97;
//Convert int to char and add to array
randChar = (char) randNum;
randChars[i] = randChar;
printf("%c", randChar);
}
//Create N threads
for (i=0; i<N; i++)
{
pthread_create(pth + i, NULL, upperThread, (void *)i);
}
printf("\n\nupper case version:\n");
//Join the threads
for(t=0; t < N; t++)
{
pthread_join(pth[t], NULL);
}
printf("\n");
pthread_exit(NULL);
return 0;
}
The example you provided is not a good multithreaded program. The reason is that your threads will constantly wait for the one which holds the lock. Which basically makes your program sequential. I would change your upperThread to
void* upperThread(void* argp){
int temp;
while(randChars[j]){
pthread_mutex_lock( &mutex1 );
temp = j;
j++;
pthread_mutex_unlock( &mutex1 );
putchar (toupper(randChars[temp]));
}
return NULL;
}
This way your threads will wait for one that holds the lock until it extracts the value of j , increment it and release the lock and then do the rest of its operations.
The general rule is that you have to acquire the lock only when you deal with critical section or critical data in this case it is an index of your string. Read about critical sections and racing conditions here

Changing parts of arrays/structs/.. in threads without blocking the whole thing, in pure c

I want to modify some (not all) fields of an array (or structs) in multiple threads, with out blocking the rest of the array as the rest of it is being modified in other threads. How is this achieved? I found some answers, but they are for C++ and I want to do it in C.
Here is the code I got so far:
#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>
#include <semaphore.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#define ARRAYLENGTH 5
#define TARGET 10000
int target;
typedef struct zstr{
int* array;
int place;
int run;
pthread_mutex_t* locks;
}zstr;
void *countup(void *);
int main(int argc, char** args){
int al;
if(argc>2){
al=atoi(args[1]);
target=atoi(args[2]);
}else{
al=ARRAYLENGTH;
target=TARGET;
}
printf("%d %d\n", al, target);
zstr* t=malloc(sizeof(zstr));
t->array=calloc(al, sizeof(int));
t->locks=calloc(al, sizeof(pthread_mutex_t));
int* rua=calloc(al, sizeof(int));
pthread_t id[4*al];
for(int i=0; i<al; i++)
pthread_mutex_init(&(t->locks[i]), NULL);
for(int j=0; j<4*al; j++){
int st=j%al;
t->run=rua[st]++;
t->place=st;
pthread_create(&id[j], NULL, &countup, t);
}
for(int k=0; k<4*al; k++){
pthread_join(id[k], NULL);
}
for(int u=0; u<al; u++)
printf("%d\n", t->array[u]);
free(rua);
free(t->locks);
free(t->array);
return 0;
}
void *countup(void* table){
zstr* nu=table;
if(!nu->run){
pthread_mutex_lock(nu->locks + nu->place);
}else{
pthread_mutex_trylock(nu->locks + nu->place);
}
while(nu->array[nu->place]<target)
nu->array[nu->place]++;
pthread_mutex_unlock(nu->locks + nu->place);
return NULL;
}
Sometimes this works just fine, but then calculates wrong values and for quiet sort problems (like the default values), it takes super long (strangely it worked once when I handed them in as parameters).
There isn't anything special about part of an array or structure. What matters is that the mutex or other synchronization you apply to a given value is used correctly.
In this case, it seems like you're not checking your locking function results.
The design of the countup function only allows a single thread to ever access the object, running the value all the way up to target before releasing the lock, but you don't check the trylock result.
So what's probably happening is the first thread gets the lock, and subsequent threads on the same mutex call trylock and fail to get the lock, but the code doesn't check the result. Then you get multiple threads incrementing the same value without synchronization. Given all the pointer dereferences the index and increment operations are not guaranteed to be atomic, leading to problems where the values grow well beyond target.
The moral of the story is to check function results and handle errors.
Sorry, don't have enough reputation to comment, yet.
Adding to Brad's comment of not checking the result of pthread_mutex_trylock, there's a misconception that shows many times with Pthreads:
You assume, that pthread_create will start immediately, and receive the values passed (here pointer t to your struct) and it's content read atomically. That is not true. The thread might start any time later and will find the contents, like t->run and t->place already changed by the next iteration of the j-loop in main.
Moreover, you might want to read David Butenhof's book "Programming with Posix Threads" (old, but still a good reference) and check on synchronization and condition variables.
It's not that good style to start that many threads in the first place ;)
As this has come up a few times and might come up again, I have restructured that a bit to issue work_items to the started threads. The code below might be amended by a function, that maps the index into array to always the same area_lock, or by adding a queue to feed the running threads with further work-item...
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>
/*
* Macros for default values. To make it more interesting, set:
* ARRAYLENGTH != THREADS
* INCREMENTS != TARGET
* NUM_AREAS != THREADS
* Please note, that NUM_AREAS must be <= ARRAY_LENGTH.
*/
#define ARRAYLENGTH 10
#define TARGET 100
#define INCREMENTS 10
#define NUM_AREAS 2
#define THREADS 5
/* These variables are initialized once in main, then only read... */
int array_len;
int target;
int num_areas;
int threads;
int increments;
/**
* A long array that is going to be equally split into number of areas.
* Each area is covered by a lock. The number of areas do not have to
* equal the length of the array, but must be smaller...
*/
typedef struct shared_array {
int * array;
int num_areas;
pthread_mutex_t * area_locks;
} shared_array;
/**
* A work-item a thread is assigned to upon startup (or later on).
* Then a value of { 0, any } might signal the ending of this thread.
* The thread is working on index within zstr->array, counting up increments
* (or up until the target is reached).
*/
typedef struct work_item {
shared_array * zstr;
int work_on_index;
int increments;
} work_item;
/* Local function declarations */
void * countup(void *);
int main(int argc, char * argv[]) {
int i;
shared_array * zstr;
if (argc == 1) {
array_len = ARRAYLENGTH;
target = TARGET;
num_areas = NUM_AREAS;
threads = THREADS;
increments = INCREMENTS;
} else if (argc == 6) {
array_len = atoi(argv[1]);
target = atoi(argv[2]);
num_areas = atoi(argv[3]);
threads = atoi(argv[4]);
increments = atoi(argv[5]);
} else {
fprintf(stderr, "USAGE: %s len target areas threads increments", argv[0]);
exit(-1);
}
assert(array_len >= num_areas);
zstr = malloc(sizeof (shared_array));
zstr->array = calloc(array_len, sizeof (int));
zstr->num_areas = num_areas;
zstr->area_locks = calloc(num_areas, sizeof (pthread_mutex_t));
for (i = 0; i < num_areas; i++)
pthread_mutex_init(&(zstr->area_locks[i]), NULL);
pthread_t * id = calloc(threads, sizeof (pthread_t));
work_item * work_items = calloc(threads, sizeof (work_item));
for (i = 0; i < threads; i++) {
work_items[i].zstr = zstr;
work_items[i].work_on_index = i % array_len;
work_items[i].increments = increments;
pthread_create(&(id[i]), NULL, &countup, &(work_items[i]));
}
// Let's just do this one work-item.
for (i = 0; i < threads; i++) {
pthread_join(id[i], NULL);
}
printf("Array: ");
for (i = 0; i < array_len; i++)
printf("%d ", zstr->array[i]);
printf("\n");
free(id);
free(work_items);
free(zstr->area_locks);
free(zstr->array);
return 0;
}
void *countup(void* first_work_item) {
work_item * wi = first_work_item;
int inc;
// Extract the information from this work-item.
int idx = wi->work_on_index;
int area = idx % wi->zstr->num_areas;
pthread_mutex_t * lock = &(wi->zstr->area_locks[area]);
pthread_mutex_lock(lock);
for (inc = wi->increments; inc > 0 && wi->zstr->array[idx] < target; inc--)
wi->zstr->array[idx]++;
pthread_mutex_unlock(lock);
return NULL;
}

Compute the summation of a given interval using multiple threads

For my homework, I need to compute the squares of integers in the interval (0,N) (e.g. (0,50) in a way that the load is distributed equally among threads (e.g. 5 threads). I have been advised to use small chunks from the interval and assign it to the thread. For that, I am using a queue. Here's my code:
#include <stdio.h>
#include <pthread.h>
#define QUEUE_SIZE 50
typedef struct {
int q[QUEUE_SIZE];
int first,last;
int count;
} queue;
void init_queue(queue *q)
{
q->first = 0;
q->last = QUEUE_SIZE - 1;
q->count = 0;
}
void enqueue(queue *q,int x)
{
q->last = (q->last + 1) % QUEUE_SIZE;
q->q[ q->last ] = x;
q->count = q->count + 1;
}
int dequeue(queue *q)
{
int x = q->q[ q->first ];
q->first = (q->first + 1) % QUEUE_SIZE;
q->count = q->count - 1;
return x;
}
queue q; //declare the queue data structure
void* threadFunc(void* data)
{
int my_data = (int)data; /* data received by thread */
int sum=0, tmp;
while (q.count)
{
tmp = dequeue(&q);
sum = sum + tmp*tmp;
usleep(1);
}
printf("SUM = %d\n", sum);
printf("Hello from new thread %u - I was created in iteration %d\n",pthread_self(), my_data);
pthread_exit(NULL); /* terminate the thread */
}
int main(int argc, char* argv[])
{
init_queue(&q);
int i;
for (i=0; i<50; i++)
{
enqueue(&q, i);
}
pthread_t *tid = malloc(5 * sizeof(pthread_t) );
int rc; //return value
for(i=0; i<5; i++)
{
rc = pthread_create(&tid[i], NULL, threadFunc, (void*)i);
if(rc) /* could not create thread */
{
printf("\n ERROR: return code from pthread_create is %u \n", rc);
return(-1);
}
}
for(i=0; i<5; i++)
{
pthread_join(tid[i], NULL);
}
}
The output is not always correct. Most of the time it is correct, 40425, but sometimes, the value is bigger. Is it because of the threads are running in parallel and accessing the queue at the same time (the processor on my laptop is is intel i7)? I would appreciate the feedback on my concerns.
I think contrary to what some of the other people here suggested, you don't need any synchronization primitives like semaphores or mutexes at all. Something like this:
Given some array like
int values[50];
I'd create a couple of threads (say: 5), each of which getting a pointer to a struct with the offset into the values array and a number of squares to compute, like
typedef struct ThreadArgs {
int *values;
size_t numSquares;
} ThreadArgs;
You can then start your threads, each of which being told to process 10 numbers:
for ( i = 0; i < 5; ++i ) {
ThreadArgs *args = malloc( sizeof( ThreadArgs ) );
args->values = values + 10 * i;
args->numSquares = 10;
pthread_create( ...., threadFunc, args );
}
Each thread then simply computes the squares it was assigned, like:
void *threadFunc( void *data )
{
ThreadArgs *args = data;
int i;
for ( i = 0; i < args->numSquares; ++i ) {
args->values[i] = args->values[i] * args->values[i];
}
free( args );
}
At the end, you'd just use a pthread_join to wait for all threads to finish, after which you have your squares in the values array.
All your threads read from the same queue. This leads to a race condition. For instance, if the number 10 was read simultaneously by two threads, your result will be offset by 100. You should protect your queue with a mutex. Put the following print in deque function to know which numbers are repeated:
printf("Dequeing %d in thread %d\n", x, pthread_self());
Your code doesn't show where the results are accumulated to a single variable. You should protect that variable with a mutex as well.
Alternatively, you can pass the start number as the input parameter to each thread from the loop so that each thread can work on its set of numbers. First thread will work on 1-10, the second one on 11-20 and so on. In this approach, you have to use mutex only the part where the threads update the global sum variable at the end of their execution.
First you need to define what it means to be "distributed equally among threads." If you mean that each thread does the same amount of work as the other threads, then I would create a single queue, put all the numbers in the queue, and start all threads (which are the same code.) Each thread tries to get a value from the queue which must be protected by a mutex unless it is thread safe, calculates the partial answer from the value taken from the thread, and adds the result to the total which must also be protected by a mutex. If you mean that each thread will execute an equal amount of times as each of the other threads, then you need to make a priority queue and put all the numbers in the queue along with the thread number that should compute on it. Each thread then tries to get a value from the queue that matches its thread number. From the thread point of view, it should try to get a value from the queue, do the work, then try to get another value. If there are no more values to get, then the thread should exit. The main program does a join on all threads and the program exits when all threads have exited.

one consumer multiple producer in c prevent racing when resuming after full buffer

I made a circular buffer with multiple clients writing a message of different length into a buffer. The server reads them out. It based the code an the consumer/producer problem.
The problem is when the buffer is full and the server removes all the data from the buffer the client is signaled to resume it's writing operations but instead another client (in another thread) start writing it message in the buffer. I want client that was already writing before the buffer was full to resume it's operations so that the message doesn't arrive out of order.
This is my code (i removed a lot of test code)
#include <stdio.h>
#include <malloc.h>
#include <string.h>
#include <pthread.h>
#include <unistd.h>
#define BUFFER_SIZE 8
#define NUM_THREADS 4
struct cBuf{
char *buf;
int size;
int start;
int end;
pthread_mutex_t mutex;
pthread_cond_t buffer_full;
pthread_cond_t buffer_empty;
};
struct cBuf cb;
void buf_Init(struct cBuf *cb, int size) {
int i;
cb->size = size + 1;
cb->start = 0;
cb->end = 0;
cb->buf = (char *)calloc(cb->size, sizeof(char));
for (i=0;i<size;i++) cb->buf[i]='_';
}
void buf_Free(struct cBuf *cb) {
free(cb->buf);
}
int buf_IsFull(struct cBuf *cb) {
return (cb->end + 1) % cb->size == cb->start;
}
int buf_IsEmpty(struct cBuf *cb) {
return cb->end == cb->start;
}
int buf_Insert(struct cBuf *cb, char *elem) {
int i,j;
pthread_mutex_lock(&(cb->mutex));
for (i=0; i < strlen(elem); ++ i){
if (buf_IsFull(cb)==1) printf("\nProducer (buf_Insert) is waiting because of full buffer");
while(buf_IsFull(cb)){
pthread_cond_signal(&(cb->buffer_full));
pthread_cond_wait(&(cb->buffer_empty),&(cb->mutex));
}
cb->buf[cb->end] = elem[i];
cb->end = (cb->end + 1) % cb->size;
printf("%c [INPUT]",elem[i]);
}
pthread_cond_signal(&(cb->buffer_full));
pthread_mutex_unlock(&(cb->mutex));
return 0;
}
int buf_Read(struct cBuf *cb, char *out) {
int i,j;
pthread_mutex_lock(&(cb->mutex));
if (buf_IsEmpty(cb))printf("\nConsumer (buf_Read) is waiting because of empty buffer\n");
while(buf_IsEmpty(cb)){
pthread_cond_wait(&(cb->buffer_full),&(cb->mutex));
}
for (i=0;i<BUFFER_SIZE-1;i++){
printf("\n");
if (cb->start == cb->end) break;
out[i] = cb->buf[cb->start];
cb->buf[cb->start] = '_';
cb->start = (cb->start + 1) % cb->size;
printf("%c [OUTPUT]",out[i]);
}
pthread_cond_signal(&(cb->buffer_empty));
pthread_mutex_unlock(&(cb->mutex));
return 0;
}
void * client(void *cb){
pthread_detach(pthread_self());
struct cBuf *myData;
myData = (struct cBuf*) cb;
char input[]="Hello World!";
if (buf_Insert(myData, input)){
//succes on return 0
printf("\n");
}
return 0;
}
int main(void) {
char out[60];
pthread_t thread;
int i;
/* Initialise conditioners*/
pthread_cond_init(&(cb.buffer_full),NULL);
pthread_cond_init(&(cb.buffer_empty),NULL);
buf_Init(&cb, BUFFER_SIZE);
for (i = 0; i<NUM_THREADS; i++){
if(pthread_create (&thread,NULL, client, (void *) &cb) !=0){
} else {
}
}
while (1){
if (buf_Read(&cb,out)){
}
}
//empty the buffer; free the allocated memory
buf_Free(&cb);
return 0;
}
I already explained in comment in Producer/consumer seems to be in deadlock when buffer is smaller than input from producer, but those are comments, so here goes as answer:
You should never ever have partial message in the queue. Make sure you never write one.
You can check whether there is enough space before starting to write the message and wait for buffer_empty straight away if there's not, or you can change the queue to send shared pointers to allocated data (either pass ownership to consumer or reference-counted) or something, so each message only takes up one slot in the queue and allocated memory for the rest. What's best will depend on the exact nature of your message. Anything will do as long as there are no partial messages.
While it would be possible to record which particular writer needs to finish a message and wake just that, it would be awfully complicated. Synchronization is hard as it is, don't make it any harder by placing additional requirements on it.
In fact unless this is a homework (in a sense you do it to learn how synchronization works), just look for ready-made message queues. The SysV-IPC ones or unix-domain sockets in datagram mode are two options that come to mind, or look for some library that does.

Passing Data to Multi Threads

I study this code from some book:
#include <pthread.h>
#include <stdio.h>
/* Parameters to print_function. */
struct char_print_parms {
/* The character to print. */
char character;
/* The number of times to print it. */
int count;
};
/* Prints a number of characters to stderr, as given by PARAMETERS,
which is a pointer to a struct char_print_parms. */
void* char_print(void* parameters) {
/* Cast the cookie pointer to the right type. */
struct char_print_parms* p = (struct char_print_parms*) parameters;
int i;
for (i = 0; i < p->count; ++i)
fputc(p->character, stderr);
return NULL;
}
/* The main program. */
int main() {
pthread_t thread1_id;
pthread_t thread2_id;
struct char_print_parms thread1_args;
struct char_print_parms thread2_args;
/* Create a new thread to print 30,000 ’x’s. */
thread1_args.character = 'x';
thread1_args.count = 30000;
pthread_create(&thread1_id, NULL, &char_print, &thread1_args);
/* Create a new thread to print 20,000 o’s. */
thread2_args.character = 'o';
thread2_args.count = 20000;
pthread_create(&thread2_id, NULL, &char_print, &thread2_args);
usleep(20);
return 0;
}
after running this code, I see different result each time. and some time corrupted result. what is wrong and what the correct way to do that?
Add:
pthread_join( thread1_id, NULL);
pthread_join( thread2_id, NULL);
to the bottom of your code, before the return in main. Your prosess ends before your threads can complete. A sleep of 20 micro seconds is not enough to let your threads complete executing. Safer to wait for the threads to return.

Resources