Random Numbers when Passing Address of Integers into pthread_create

Random Numbers when Passing Address of Integers into pthread_create - c

I found a similar post pthread_create and passing an integer as the last argument however when implementing it I still receive the wrong ID values (as shown in the output log at the bottom of the post).
I have this portion of code which creates the threads. Its pretty standard. I have an array of threads which allocate a thread id which is passed into the function.
int nthreads = 7;
pthread_t tid[nthreads];
fprintf(stdout, "Creating threads.\n");
for (int i =0; i < nthreads; i++){
tid[i] = i;
if(pthread_create(&tid[i], NULL, threadRequestFile, &tid[i]) != 0){
fprintf(stdout, "Error pthread_create().\n");
steque_destroy(workerQueue);
free(workerQueue);
workerQueue = NULL;
exit(EXIT_FAILURE);
}
}
fprintf(stdout, "Created %d threads.\n", nthreads);
for(int i = 0; i < nthreads; i++){
fprintf(stdout, "Awaiting Thread %d.\n", i);
pthread_join(tid[i], NULL);
}
Finally my pointer function with has the ids stored in the address of tid[i] are passed into to print to the console:
void *threadRequestFile(void *nRequests){
int totalRequests = * ((int *)nRequests);
fprintf(stdout, "Total Request: %d\n", totalRequests);
}
Unfortunately closely following the code in the post above and from other sources I found online, my console is still printing weird numbers rather than 0-6. Can anyone help me out as to why this is occurring?
Creating threads.
Total Request: -1210059008 // Should be 0
Total Request: -1218451712 // ..
Total Request: -1226844416 // ..
Total Request: -1235237120 // ..
Total Request: -1243629824 // ..
Total Request: -1252022528 // ..
Total Request: -1260415232 // 6
Created 7 threads.
Awaiting Thread 0.
Awaiting Thread 1.
Awaiting Thread 2.
Awaiting Thread 3.
Awaiting Thread 4.
Awaiting Thread 5.
Awaiting Thread 6.

Related

Same function takes more time to execute in child

SHORT: My problem is that the following function which sum the content of a given array in a given range, doesn't take the same execution time for the same task as the parent if called from a child. And by same I mean similar values. Because after some tests, the differences is for about a ~40% more for the child to execute.
LONG: I'm studying computer science and I'm working on the final project of a course. The problem is to sum all the cells of an int array by n processes distributing the workload. then confront it to a single calculation made by parent. The point is to demonstrate that, with big arrays, multiple process can reduce the execution time.
However, the sum of all children's times is always more than the parent's even with 50milions data.
Following just the function and the struct I use to pass results.
typedef struct cResults{
double time;
int result;
}cResult;
cResult arraySum(int start, int length, int* dataset)
{
/*Time unit to calculate the computation time*/
clock_t tic, toc = 0;
cResult chunk = {.result = 0, .time =0};
tic = clock(); // Start measure time
for (int i = start; i < start + length; i++)
{
chunk.result += dataset[i];
}
toc = clock(); // Stop measure time
chunk.time = ((double) (toc - tic))/ CLOCKS_PER_SEC;
printf("B: start at: %d, length: %d, sum: %d, in: %f\n", start, length, chunk.result, chunk.time); //! DEBUG
return chunk;
}
WHAT I'VE TRIED SO FAR:
Since the array is dynamically allocated, I've thought that it could be a bottleneck on the memory access. However, this question (Malloc returns same address in parent and child process after fork) lifted all doubt that even if heap allocated, they are not the same, but a copy of the array.
I've double checked that the parent will sum correctly and only once the elapsed time communicated by all the children, and then added the print() statement just to read and sum manually on the terminal all the results. And again, all checks.
I've tried moving the function call by parent from before to after all children were done, but no changes, then I've tried make the parent sleep() right after fork() (this was counterproductive for the purpose of the project but just to make sure) for avoiding resource queue.
The random numbers in the array are produced in a repeatable way through a seed, so I've tried same datasets that of course will give almost identical outputs, and again times will change slightly yet maintaining the single execution faster.
Ultimately I've tried to fork a single child and make it calculate the same range as the parent (so all the array). The time is on average 45% slower on children.
Surely I'm missing a simple thing but i run out of ideas... please be patient I'm learning by my self at my best.
UPDATE 1:
Since I've been asked for a full program, I've refactored mine. However, whereas the project required a single source file I've removed all of non regarding parts of our issue, hence it should be lighter. Unfortunately the frame you'll see that handle the pipe communications is a bit complex I'm afraid but it serve for other purpose that has been removed yet essential to make this works. I do hope it won't be a problem.
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <limits.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
/* ----- PROTOTYPES ----- */
#define MIN 5 // Least number of process and threads
#define MAX 10 // Maximum number of process and threads
#define MSGSIZE 4 // Size of pipe messages
/* =================
*PIPE MESSAGES*
sum = Sum from the next given start to the next given roof.
end = End process
tmr = Return the computation time measured by the process
================= */
/// #brief Struct containing proces id (pid), the result and it's pipe file descriptors
struct process{
pid_t pid;
long int result;
bool done;
double time;
int fd[2];
};
/// #brief Struct that containts the result of the computation
typedef struct cResults{
double time;
long int result;
}cResult;
/// #brief W = write, R = read
enum PIPECONTROLLER{
R = 0,
W = 1
};
/* ----- PROTOTYPES ----- */
cResult arraySum(int start, int length, int* dataset);
int main()
{
/* =================
*USER DEFINED*
================= */
int dataLength = 50000000; // Set the length of the dataset
int nProcess = 1; // How many child process do you want
unsigned int seed = 1; // Change the randomization seed of the data
// System
int* data;
cResult chunk; // Used for storing temporary computational values
// Task related
int taskLenght;
int remainder;
// Pipe related
char pipeMSG[MSGSIZE];
int msgCheck = 1;
/// #brief Processes dashboard
struct process processes[MAX + 1] = { {.pid = 0, .result = 0, .done = false, .time = 0} };
data = malloc(sizeof(int) * dataLength);
srand(seed);
for (int i = 0; i < dataLength; i++)
{
/*Random population between 0-100*/
data[i] = rand() % 100;
}
chunk = arraySum(0, dataLength, data);
processes[nProcess + 1].result = chunk.result;
processes[nProcess + 1].time = chunk.time;
printf("\nCHECK SUM: %ld\n", chunk.result);// ! Debug
#pragma region "START PROCESSES"
/*Calculate how to separate the overhead for the processes*/
taskLenght = dataLength / nProcess;
remainder = dataLength % nProcess;
pid_t myPid = 0;
int startPoint = 0;
processes[nProcess + 1 ].pid = getpid();
/*Open child to parent pipe*/
if (pipe(processes[nProcess + 1 ].fd) == -1)
{
printf("Failed to open pipe on parent\n");
return 1;
}
for (int i = 0; i < nProcess; i++)
{
/*Open new parent to child pipe*/
if (pipe(processes[i].fd) == -1)
{
printf("Failed to open pipe on parent\n");
return 1;
}
myPid = fork();
switch (myPid)
{
case -1: // Error on fork
printf("An error occured while forking the %d process.\n", i);
return 1;
break;
case 0: // Child case
/*Record pid in the dashboard*/
processes[i].pid = getpid();
/*Handle the pipes descriptors*/
close(processes[i].fd[W]);
close(processes[nProcess + 1 ].fd[R]);
i = nProcess;
break;
default: // Parent case
/* Record the pid process into the dashrboard and increment the starting for the next*/
processes[i].pid = myPid;
startPoint += taskLenght;
/*Handle the pipes descriptors*/
close(processes[i].fd[R]);
break;
}
}
/*=========== CHILD PROCESS HANDLER ===========*/
if(myPid == 0)
{
int myID;
bool keepAlive = true;
for(myID = 0; myID < nProcess; myID++)
{
if (processes[myID].pid == getpid())
{
break;
}
}
/*Calculate first iteration of the sum*/
cResult temp = arraySum(startPoint, taskLenght, data);
chunk.result = temp.result;
chunk.time = temp.time;
while(keepAlive)
{
/*Comunicate the id of the process*/
if (write(processes[nProcess + 1 ].fd[W], &myID, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
/*Communicate the result of the operation*/
if (write(processes[nProcess + 1 ].fd[W], &chunk.result, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
/*Communicate the time elapsed for the operation*/
if (write(processes[nProcess + 1 ].fd[W], &chunk.time, sizeof(double)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
/*Waits for further instruction*/
msgCheck = read(processes[myID].fd[R], pipeMSG, MSGSIZE);
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Sum command*/
if(!strcmp(pipeMSG, "sum"))
{
msgCheck = read(processes[myID].fd[R], &startPoint, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
msgCheck = read(processes[myID].fd[R], &taskLenght, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Calculate second iteration for the remaining part*/
temp = arraySum(startPoint, taskLenght, data);
chunk.result += temp.result;
chunk.time += temp.time;
}
/*Close command*/
if(!strcmp(pipeMSG, "end"))
{
keepAlive = false;
}
}
free(data);
close(processes[myID].fd[R]);
exit(0);
}
/*=========== PARENT PROCESS HANDLER ===========*/
if(myPid != 0)
{
/*Close descriptor for writing on main pipe.*/
close(processes[nProcess + 1 ].fd[W]);
int targetProcess = nProcess + 1; // Target self
bool onGoing = true;
chunk.result = 0;
chunk.time = 0;
while(onGoing)
{
/*Listen from processes if someone ended the task*/
msgCheck = read(processes[nProcess + 1 ].fd[R], &targetProcess, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Get result from child process*/
msgCheck = read(processes[nProcess + 1 ].fd[R], &processes[targetProcess].result, sizeof(int));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
/*Get elapsed time from child process*/
msgCheck = read(processes[nProcess + 1 ].fd[R], &processes[targetProcess].time, sizeof(double));
if(msgCheck < 0)
{
printf("An error occured from the process %d while reading message from parent\n", getpid());
return 1;
}
processes[targetProcess].done = true;
/*Check if remainder to start new task*/
if(remainder != 0)
{
startPoint = taskLenght * nProcess;
processes[targetProcess].done = false;
if (write(processes[targetProcess].fd[W], "sum", MSGSIZE) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
if (write(processes[targetProcess].fd[W], &startPoint, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
if (write(processes[targetProcess].fd[W], &remainder, sizeof(int)) < 0)
{
printf("An error occured from the process %d while sending message to parent\n", getpid());
return 1;
}
remainder = 0; //Avoid looping task
}
/*Check for pending response and process final result*/
for (int i = 0; i < nProcess; i++)
{
if(processes[i].done)
{
chunk.result += processes[i].result;
chunk.time += processes[i].time;
onGoing = false;
continue;
}
onGoing = true;
/*Reset total calculations*/
chunk.result = 0;
chunk.time = 0;
break;
}
/*Reset to self target*/
targetProcess = nProcess + 1;
}
printf("Parent calculated: %ld in = %fs\n", processes[nProcess + 1].result, processes[nProcess + 1].time); //! Debug
printf("Processes calculated: %ld in = %fs\n", chunk.result, chunk.time); //! Debug
}
}
cResult arraySum(int start, int length, int* dataset)
{
/*Time unit to calculate the computation time*/
clock_t tic, toc = 0;
cResult chunk = {.result = 0, .time =0};
tic = clock(); // Start measure time
for (int i = start; i < start + length; i++)
{
chunk.result += dataset[i];
}
toc = clock(); // Stop measure time
chunk.time = ((double) (toc - tic))/ CLOCKS_PER_SEC;
printf("start at: %d, length: %d, sum: %ld, in: %f\n", start, length, chunk.result, chunk.time); //! Debug
return chunk;
}
If you want to try this out you'll find in USER DEFINED some variable to play with.
I'll share some of my results, with seed = 1.
LENGTH
PARENT
1 CHILD
5 CHILDREN
10 CHILDREN
1'000
0.000001
0.000002
0.000003
0.000006
100'000
0.000085
0.000107
0.000115
0.000120
10'000'000
0.008693
0.015143
0.016120
0.015982
100'000'000
0.089563
0.148095
0.146698
0.149421
500'000'000
0.669474
0.801828
0.744381
0.816883
As you can see even repeating the task in a single child process in some cases required twice the time.
As ArkadiuszDrabczyk pointed out, it could be a scheduling issue, still why has always to be the child the slowest one?
UPDATE 2:
Since pipe's problems arose, I wrote another source just to exclude these concerns and, of course, the problem remained.
However I wasn't convinced about the deep-copy of the dataset[] array stored on heap hence some research later I found this: What is copy-on-write?. After this new found knowledge, I've added this useless function to be called right before the arraySum() :
void justCheck(int start, int length, int* dataset)
{
for (int i = start; i < start + length; i++)
{
dataset[i] = dataset[i];
}
return;
}
This little snippet managed to level the differences between times. At least now they lay on the same order of magnitude.
Down the results:
LENGTH
PARENT
1 CHILD
5 CHILDREN
10 CHILDREN
1'000
0.000002
0.000001
0.000004
0.000003
100'000
0.000099
0.000110
0.000124
0.000121
10'000'000
0.009496
0.008686
0.009316
0.009248
100'000'000
0.090267
0.092168
0.089862
0.093356
500'000'000
-
-
-
-
Unfortunately this edit rise another problem. With big set of data this function freeze. I know this isn't the right way to force the COW so... Any suggestions?

SHORT: Turns out that in order to prove the thesis of the project I was using the wrong approach. Summing all of the clock times taken by each processes and comparing to the single one is like trying to prove that baking 4 pieces of bread in a row takes less than baking all simultaneously counting the oven time instead of watching my watch. I should have measured the wall time instead of clock time.
LONG: That said, regarding why the same function called by only one child takes more time than parent: As I said in my 2nd update I've managed to trim time to a more plausible value forcing the copy-on-write of used heap on child memory page before calling the actual calculating function. So after several experiments, turns out it's aleatory. Sometimes it's faster, sometimes is almost equal and sometimes slower. So, I thinks it depends on how the task scheduling works, which I don't have control on it neither know how to.
Talking about the other problem mentioned:
Unfortunately this edit rise another problem. With big set of data this function freeze. I know this isn't the right way to force the COW so... Any suggestions?
Well the point is that forcing COW on each processes means that, according to Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne, if I can fit ~
256mln INT value in 1GB of memory the problem is the massive devour of memory my program cause. Checking with my system monitor, I could validate that using 500mln values into my array result in almost 2GB of RAM taken. Which once fork it became equal to nProcess + 1.
I thought to answer these question will help posterity.

CreateThread x3000 returns stdout handle

I have problem creating thousands of threads and closing it.
Look at this code:
HANDLE threadHandles[i];
for(int i = 0; i < 1000; i++)
{
CreateThread(0, 0, &func, 0, CREATE_SUSPENDED, threadHandles[i]);
printf("%i - %i\n", i, threadHandles[i]);
CloseHandle(threadHandles[i])
}
printf("Last Error is %i", GetLastError());
It should proceed this output:
0 - 236423
1 - 23456236
2 - 2373547
3 - 73521346
4 - 23456775
5 - 78543683465
...
2998 - 754752
2999 - 23462346
Last Error is 0
like this.
But actually it will print nothing. Why? Because one of the created threads had a conflict with stdout handle.
I realized it using this code:
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
HANDLE threadHandles[i];
for(int i = 0; i < 1000; i++)
{
CreateThread(0, 0, &func, 0, CREATE_SUSPENDED, threadHandles[i]);
printf("%i - %i\n", i, threadHandles[i]);
if(threadHandles[i] != stdout) // I have stdout == 12 on my machine
CloseHandle(threadHandles[i])
}
and it worked like this:
0 - 236423
1 - 23456236
2 - 2373547
3 - 0
4 - 23456775
5 - 78543683465
...
2998 - 0
2999 - 23462346
Last Error is 6
Where is the problem? Why there is a conflict between created handles and existing standard handles?

The thread handle is returned by the function! The last parameter receives the thread id, not the handle.
HANDLE threadHandles[1000] = { 0 };
UINT i;
for (i = 0; i < 1000; ++i)
{
DWORD threadId;
threadHandles[i] = CreateThread(0, 0, &func, 0, /*CREATE_SUSPENDED*/0, &threadId);
if (!threadHandles[i])
{
printf("Failed to create thread, error %u\n", GetLastError());
break;
}
printf("Thread #%u: handle=%p id=%u\n", i, threadHandles[i], threadId);
// Not closing the handle here so the example will show unique thread handles and ids: CloseHandle(threadHandles[i]), threadHandles[i] = NULL;
}
// TODO: Wait for threads or do some other work?
for (i = 0; i < 1000; ++i)
if (threadHandles[i])
CloseHandle(threadHandles[i]);
On my machine this prints
Thread #0: handle=0000002C id=16424
Thread #1: handle=00000030 id=21192
Thread #2: handle=00000034 id=21180
Thread #3: handle=00000038 id=17336
Thread #4: handle=0000003C id=21184
Thread #5: handle=00000040 id=4460
...
Thread #991: handle=00000FE8 id=12280
Thread #992: handle=00000FEC id=20360
Thread #993: handle=00000FF0 id=20328
Thread #994: handle=00000FF4 id=16060
Thread #995: handle=00000FF8 id=4556
Thread #996: handle=00000FFC id=20296
Thread #997: handle=00001004 id=10316
Thread #998: handle=00001008 id=20604
Thread #999: handle=0000100C id=20264

Simple pipelining using pthreads

Very recently I have started working on pthreads and trying to implement software pipelining with pthreads. To do that I have a written a toy program myself, a similar of which would be a part of my main project.
So in this program the main thread creates and input and output buffer of integer type and then creates a single master thread and passes those buffers to the master thread. The master thread in turn creates two worker threads.
The input and the output buffer that is passed from the main to the master thread is of size nxk (e.g. 5x10 of size int). The master thread iterates over a chunk of size k (i.e. 10) for n (i.e. 5) number of times.
There is a loop running in the master thread for k (5 in here) number of times. In each iteration of k the master thread does some operation on a portion of input data of size n and place it in the common buffer shared between the master and the worker threads. The master thread then signals the worker threads that the data has been placed in the common buffer.
The two worker threads waits for the signal from the master thread if the common buffer is ready. The operation on the common buffer is divided into half among the worker threads. Which means one worker thread would work on the first half and the other worker thread would work on the next half of the common buffer.
Once the worker threads gets the signal from the master thread, each of the worker thread does some operation on their half of the data and copy it to the output buffer. Then the worker threads informs the master thread that their operation is complete on the common buffer by setting flag values. An array of flags are created for worker threads. The master thread keeps on checking if all the flags are set which basically means all the worker threads finished their operation on the common buffer and so master thread can place the next data chunk into the common buffer safely for worker thread's consumption.
So essentially there is communication between the master and the worker threads in a pipelined fashion. In the very end I am printing the output buffer in the main thread. But I am getting no output at all. I have copy pasted my code with full comments on almost all steps.
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/time.h>
#include <semaphore.h>
#include <unistd.h>
#include <stdbool.h>
#include <string.h>
#define MthNum 1 //Number of Master threads
#define WthNum 2 //Number of Worker threads
#define times 5 // Number of times the iteration (n in the explanation)
#define elNum 10 //Chunk size during each iteration (k in the explanation)
pthread_mutex_t mutex; // mutex variable declaration
pthread_cond_t cond_var; //conditional variarble declaration
bool completion_flag = true; //This global flag indicates the completion of the worker thread. Turned false once all operation ends
//marking the completion
int *commonBuff; //common buffer between master and worker threads
int *commFlags; //array of flags that are turned to 1 by each worker threads. So worker thread i turns commFlags[i] to 1
// the master thread turns commFlags[i] = 0 for i =0 to (WthNum - 1)
int *commFlags_s;
int counter; // This counter used my master thread to count if all the commFlags[i] that shows
//all the threads finished their work on the common buffer
// static pthread_barrier_t barrier;
// Arguments structure passed to master thread
typedef struct{
int *input; // input buffer
int *output;// output buffer
}master_args;
// Arguments structure passed to worker thread
typedef struct{
int threadId;
int *outBuff;
}worker_args;
void* worker_func(void *arguments);
void *master_func(void *);
int main(int argc,char*argv[]){
int *ipData,*opData;
int i,j;
// allocation of input buffer and initializing to 0
ipData = (int *)malloc(times*elNum*sizeof(int));
memset(ipData,0,times*elNum*sizeof(int));
// allocation of output buffer and initializing to 0
opData = (int *)malloc(times*elNum*sizeof(int));
memset(opData,0,times*elNum*sizeof(int));
pthread_t thread[MthNum];
master_args* args[MthNum];
//creating the single master thread and passing the arguments
for( i=0;i<MthNum;i++){
args[i] = (master_args *)malloc(sizeof(master_args));
args[i]->input= ipData;
args[i]->output= opData;
pthread_create(&thread[i],NULL,master_func,(void *)args[i]);
}
//joining the master thred
for(i=0;i<MthNum;i++){
pthread_join(thread[i],NULL);
}
//printing the output buffer values
for(j =0;j<times;j++ ){
for(i =0;i<elNum;i++){
printf("%d\t",opData[i+j*times]);
}
printf("\n");
}
return 0;
}
//This is the master thread function
void *master_func(void *arguments){
//copying the arguments pointer to local variables
master_args* localMasterArgs = (master_args *)arguments;
int *indataArgs = localMasterArgs->input; //input buffer
int *outdataArgs = localMasterArgs->output; //output buffer
//worker thread declaration
pthread_t Workers[WthNum];
//worker thread arguments declaration
worker_args* wArguments[WthNum];
int i,j;
pthread_mutex_init(&mutex, NULL);
pthread_cond_init (&cond_var, NULL);
counter =0;
commonBuff = (int *)malloc(elNum*sizeof(int));
commFlags = (int *)malloc(WthNum*sizeof(int));
memset(commFlags,0,WthNum*sizeof(int) );
commFlags_s= (int *)malloc(WthNum*sizeof(int));
memset(commFlags_s,0,WthNum*sizeof(int) );
for(i =0;i<WthNum;i++){
wArguments[i] = (worker_args* )malloc(sizeof(worker_args));
wArguments[i]->threadId = i;
wArguments[i]->outBuff = outdataArgs;
pthread_create(&Workers[i],NULL,worker_func,(void *)wArguments[i]);
}
for (i = 0; i < times; i++) {
for (j = 0; j < elNum; j++)
indataArgs[i + j * elNum] = i + j;
while (counter != 0) {
counter = 0;
pthread_mutex_lock(&mutex);
for (j = 0; j < WthNum; j++) {
counter += commFlags_s[j];
}
pthread_mutex_unlock(&mutex);
}
pthread_mutex_lock(&mutex);
memcpy(commonBuff, &indataArgs[i * elNum], sizeof(int));
pthread_mutex_unlock(&mutex);
counter = 1;
while (counter != 0) {
counter = 0;
pthread_mutex_lock(&mutex);
for (j = 0; j < WthNum; j++) {
counter += commFlags[j];
}
pthread_mutex_unlock(&mutex);
}
// printf("master broad cast\n");
pthread_mutex_lock(&mutex);
pthread_cond_broadcast(&cond_var);
//releasing the lock
pthread_mutex_unlock(&mutex);
}
pthread_mutex_lock(&mutex);
completion_flag = false;
pthread_mutex_unlock(&mutex);
for (i = 0; i < WthNum; i++) {
pthread_join(Workers[i], NULL);
}
pthread_mutex_destroy(&mutex);
pthread_cond_destroy(&cond_var);
return NULL;
}
void* worker_func(void *arguments){
worker_args* localArgs = (worker_args*)arguments;
//copying the thread ID and the output buffer
int tid = localArgs->threadId;
int *localopBuffer = localArgs->outBuff;
int i,j;
bool local_completion_flag=false;
while(local_completion_flag){
pthread_mutex_lock(&mutex);
commFlags[tid] =0;
commFlags_s[tid] =1;
pthread_cond_wait(&cond_var,&mutex);
commFlags_s[tid] =0;
commFlags[tid] =1;
if (tid == 0) {
for (i = 0; i < (elNum / 2); i++) {
localopBuffer[i] = commonBuff[i] * 5;
}
} else { // Thread ID 1 operating on the other half of the common buffer data and placing on the
// output buffer
for (i = 0; i < (elNum / 2); i++) {
localopBuffer[elNum / 2 + i] = commonBuff[elNum / 2 + i] * 10;
}
}
local_completion_flag=completion_flag;
pthread_mutex_unlock(&mutex);//releasing the lock
}
return NULL;
}
But I have no idea where I have done wrong in my implementation since logically it seems to be correct. But definitely there is something wrong in my implementation. I have spent a long time trying different things to fix it but nothing worked. Sorry for this long post but I am unable to determine a section where I might have done wrong and so I couldn't concise the post. So if anybody could take a look into the problem and implementation and can suggest what changes needed to be done to run it as intended then that it would be really helpful. Thank you for your help and assistance.

There are several errors in this code.
You may start from fixing creation of worker threads:
wArguments[i] = (worker_args* )malloc(sizeof(worker_args));
wArguments[i]->threadId = i;
wArguments[i]->outBuff = outdataArgs;
pthread_create(&Workers[i],NULL,worker_func, (void *)wArguments);
You are initializing worker_args structs but incorrectly - passing pointer to array (void *)wArguments instead of pointers to array elements you just initialized.
pthread_create(&Workers[i],NULL,worker_func, (void *)wArguments[i]);
// ^^^
Initialize counter before starting threads that use it's value:
void *master_func(void *arguments)
{
/* (...) */
pthread_mutex_init(&mutex, NULL);
pthread_cond_init (&cond_var, NULL);
counter = WthNum;
When starting master thread, you incorrectly pass pointer to pointer:
pthread_create(&thread[i],NULL,master_func,(void *)&args[i]);
Please change this to:
pthread_create(&thread[i],NULL,master_func,(void *) args[i]);
All accesses to counter variable (as any other shared memory) must be synchronized between threads.

I think you should use semaphore based producer- consumer model like this
https://jlmedina123.wordpress.com/2014/04/08/255/

pthread_join seems to modify my loop index

My code (see below) produces an odd behaviour. The output is:
Testing whether there are problems with concurrency ...rc is 0. i is 0
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 3. i is 2
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 3. i is 2
.rc is 3. i is 3
.rc is 0. i is 0
.rc is 3. i is 1
.rc is 3. i is 2
.rc is 3. i is 3
.rc is 3. i is 4
.rc is 0. i is 0
Segmentation fault (core dumped)
I tried to debug it, but only found out that i is reset to 0 right after pthread_join. This leads me to the conclusion that the modification must happen somewhere there. But i can't find a thing. I feel kind of stupid, since this isn't really a hard piece of code. What did i not notice?
Operating system is Ubuntu 14.04. N_THREADS is currently set to 10, N_RUNS is 10000.
Main thread:
pthread_t threads[N_THREADS];
pthread_attr_t attr;
int i;
int rc;
int status;
printf("Testing whether there are problems with concurrency ...");
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for (i = 0; i < N_THREADS; i++){
if (i) {
rc = pthread_create(&(threads[i]), &attr, addRemove, 0);
} else {
rc = pthread_create(&(threads[i]), &attr, readStuff, 0);
}
if (rc) return rc;
}
for(i = 0; i < N_THREADS; i++) {
rc = pthread_join(threads[i], (void*) &status);
// if(rc == 3)
printf("rc is %d. i is %d\n", rc, i);
// if (rc) return rc;
if (status) return status;
printf(".");
}
pthread_attr_destroy(&attr);
return 0;
Worker threads:
void* readStuff(void* a)
{
int i;
for (i = 0; i< N_RUNS; i++){
;
}
pthread_exit((void*)0);
}
void* addRemove(void* a)
{
int i;
for (i = 0; i< N_RUNS; i++){
;
}
pthread_exit((void*)0);
}
There are no other threads except the main thread and the ones created in the code above.
Compileable example

I think your problem is with the pthread_join. From the man page:
int pthread_join(pthread_t thread, void **retval);
...
If retval is not NULL, then pthread_join() copies the exit status of the tar‐
get thread (i.e., the value that the target thread supplied to
pthread_exit(3)) into the location pointed to by *retval. If the target
thread was canceled, then PTHREAD_CANCELED is placed in *retval.
Note that it takes a void **, which means it overwrites the thing pointed to by retval with a void * (size 8 on 64 bit). You are passing an int * (i.e. &status), which is a pointer to an object of size 4 on most platforms.
So, pthread_join will be overwriting memory. Instead, declare status as a void * as per the function prototype.
You are also testing status; I don't know what you are trying to achieve here.
In general, compiling with -Wall will show you these errors.

pthread_create() fails with EAGAIN at 291 cycle

I had this code:
int main(int argc, char** argv)
{
pthread_t thread[thr_num];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// just for debugging //
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
printf ("soft = %d \n", rlim.rlim_cur);
printf ("hard = %d \n", rlim.rlim_max);
////
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
return 0;
}
void* loggerThread(void* data)
{
char** sthg = ((char**)data);
pthread_exit(NULL);
}
I don't understand why when I run this code with thr_num=291, I got an error:
pthread_create failure, i = 291, errno = 11 (EAGAIN)
with thr_num=290 worked fine. I run this code on a Linux 2.6.27.54-0.2-default (SLES 11)
The rlim.rlim_cur has value 6906 the rlim.rlim_max also. The same I saw with 'ulimit -a' for 'max user processes'.
I checked also /proc/sys/kernel/threads-max (it was 13813) guided by pthread_create man page.
Did not find any parameters with value 290 for 'sysctl -a' output either.
Ocassionally I found out from this link:
pthread_create and EAGAIN
that: "Even if pthread_exit or pthread_cancel is called, the parent process still need to call pthread_join to release the pthread ID, which will then become recyclable"
so just as a try I modified my code to this:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
if( pthread_join(thread[i], (void**)&status ) ) {
printf("pthread_join failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
and then everything worked: I didn't get the error at 291 cycle.
I would like to understand why with my original code I got the error:
1. because of a wrong programing with threads
2. or I hit some system limit what I couldn't identify
Also would like to know if my correction is good for this problem or what hidden things, pitfalls I eventually introduced with this solution ?
Thanks !

I would like to understand why with my original code I got the error: 1. because of a wrong
programing with threads 2. or I hit some system limit what I couldn't identify
You likely hit a system limit. Likely you ran out of address space. Default, each thread gets 8-10Mb of stack space on linux. If you create 290 threads, that's using nearly 3Gb of address space - the max for a 32 bit process.
You get EAGAIN in such a case, since there arn't enough resources to create the thread just now (since there isn't enough address space available at the time).
When a thread exits, not all resources of the thread is released (on linux, the entire stack of the thread is kept around).
If the thread is in a detached state, e.g. you called pthread_detach() or specified a detached state when it was created as an attribute to pthread_create(), all resources are release when the thread exits - but you can't pthread_join() a detached thread.
If the thread is not detached, you need to call pthread_join() on it to release the resources.
Note that the modified code of yours where you call pthread_join() inside the loop will:
spawn a thread
Wait for that thread to finish
go to 1
i.e. only one other thread is running at a time - which seems a bit pointless.
You can certainly spawn more than one thread that run concurrently - but there's a limit. On your machine, you seem to have found the limit to be around 290.

I initially wrote this as a comment, but just in case...
Your code:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
...
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
In both the for() loops you check from 1 - thr_num. This means you are out of bounds in your array thread[thr_num] since arrays start at index 0. You should thus iterate from 0 to one less than thr_num:
for ( i = 0 ; i < thr_num ; i++)
I'm actually surprised you didn't get a segmentation fault before hitting 291 as thr_num.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Random Numbers when Passing Address of Integers into pthread_create - c

Related

Same function takes more time to execute in child

CreateThread x3000 returns stdout handle

Simple pipelining using pthreads

pthread_join seems to modify my loop index

pthread_create() fails with EAGAIN at 291 cycle

Categories

Resources