Why does this code work without a mutex? - c

I am trying to learn how locks work in multi-threading. When I execute the following code without lock, it worked fine even though the variable sum is declared as a global variable and multiple threads are updating it. Could anyone please explain why here threads are working perfectly fine on a shared variable without locks?
Here is the code:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define NTHREADS 100
#define ARRAYSIZE 1000000
#define ITERATIONS ARRAYSIZE / NTHREADS
double sum=0.0, a[ARRAYSIZE];
pthread_mutex_t sum_mutex;
void *do_work(void *tid)
{
int i, start, *mytid, end;
double mysum=0.0;
/* Initialize my part of the global array and keep local sum */
mytid = (int *) tid;
start = (*mytid * ITERATIONS);
end = start + ITERATIONS;
printf ("Thread %d doing iterations %d to %d\n",*mytid,start,end-1);
for (i=start; i < end ; i++) {
a[i] = i * 1.0;
mysum = mysum + a[i];
}
/* Lock the mutex and update the global sum, then exit */
//pthread_mutex_lock (&sum_mutex); //here I tried not to use locks
sum = sum + mysum;
//pthread_mutex_unlock (&sum_mutex);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
int i, start, tids[NTHREADS];
pthread_t threads[NTHREADS];
pthread_attr_t attr;
/* Pthreads setup: initialize mutex and explicitly create threads in a
joinable state (for portability). Pass each thread its loop offset */
pthread_mutex_init(&sum_mutex, NULL);
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
for (i=0; i<NTHREADS; i++) {
tids[i] = i;
pthread_create(&threads[i], &attr, do_work, (void *) &tids[i]);
}
/* Wait for all threads to complete then print global sum */
for (i=0; i<NTHREADS; i++) {
pthread_join(threads[i], NULL);
}
printf ("Done. Sum= %e \n", sum);
sum=0.0;
for (i=0;i<ARRAYSIZE;i++){
a[i] = i*1.0;
sum = sum + a[i]; }
printf("Check Sum= %e\n",sum);
/* Clean up and exit */
pthread_attr_destroy(&attr);
pthread_mutex_destroy(&sum_mutex);
pthread_exit (NULL);
}
With and without lock I got the same answer!
Done. Sum= 4.999995e+11
Check Sum= 4.999995e+11
UPDATE: Change suggested by user3386109
for (i=start; i < end ; i++) {
a[i] = i * 1.0;
//pthread_mutex_lock (&sum_mutex);
sum = sum + a[i];
//pthread_mutex_lock (&sum_mutex);
}
EFFECT :
Done. Sum= 3.878172e+11
Check Sum= 4.999995e+11

Mutexes are used to prevent race conditions which are undesirable situations when you have two or more threads accessing a shared resource. Race conditions such as the one in your code happen when the shared variable sum is being accessed by multiple threads. Sometimes the access to the shared variable will be interleaved in such a way that the result is incorrect and sometimes the result will be correct.
For example lets say you have two threads, thread A and thread B both adding 1 to a shared value, sum, which starts at 5. If thread A reads sum and then thread B reads sum and then thread A writes a new value followed by thread B writing a new value you will can an incorrect result, 6 as opposed to 7. However it is also possible than thread A reads and then writes a value (specifically 6) followed by thread B reading and writing a value (specifically 7) and then you get the correct result. The point being that some interleavings of operations result in the correct value and some interleavings result in an incorrect value. The job of the mutex is to force the interleaving to always be correct.

Related

Synchronization problem between threads using pthread_mutex_t

Basically i need to make three threads B,C,D to work simultaneously. Thread B sums up the even indexes in a global array X , C sums up the odd indexes in X, D sums up both results while B and C are still summing. I used two mutexes to do so but its not working properly.
In the array X given in the code below the results should be: evenSum = 47,oddSum = 127 ,bothSum = 174.
any help is greatly appreciated!
Thanks!!
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#define SIZE 20
int X[SIZE] = {5,4,5,3,7,9,3,3,1,2,9,0,3,43,3,56,7,3,4,4};
int evenSum = 0;
int oddSum = 0;
int bothSum = 0;
//Initialize two mutex semaphores
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
void* sum_even_indexes(void* args){
int i;
for(i=0 ; i<20 ; i+=2){
pthread_mutex_lock(&mutex1);
evenSum+=X[i];
pthread_mutex_unlock(&mutex1);
}
pthread_exit(NULL);
}
void* sum_odd_indexes(void* args){
int i;
for(i=1 ; i<20 ; i+=2){
pthread_mutex_lock(&mutex2);
oddSum+=X[i];
pthread_mutex_unlock(&mutex2);
}
pthread_exit(NULL);
}
void* sum_of_both(void* args){
int i;
for(i=0 ; i<SIZE ; i++){
pthread_mutex_lock(&mutex1);
pthread_mutex_lock(&mutex2);
bothSum += (evenSum+oddSum);
pthread_mutex_unlock(&mutex2);
pthread_mutex_unlock(&mutex1);
}
pthread_exit(NULL);
}
int main(int argc, char const *argv[]){
pthread_t B,C,D;
/***
* Create three threads:
* thread B : Sums up the even indexes in the array X
* thread C : Sums up the odd indexes in the array X
* thread D : Sums up both B and C results
*
* Note:
* All threads must work simultaneously
*/
pthread_create(&B,NULL,sum_even_indexes,NULL);
pthread_create(&C,NULL,sum_odd_indexes,NULL);
pthread_create(&D,NULL,sum_of_both,NULL);
//Wait for all threads to finish
pthread_join(B,NULL);
pthread_join(C,NULL);
pthread_join(D,NULL);
pthread_mutex_destroy(&mutex1);
pthread_mutex_destroy(&mutex2);
//Testing Print
printf("Odds Sum = %d\n",oddSum);
printf("Evens Sum = %d\n",evenSum);
printf("Both Sum = %d\n",bothSum);
return 0;
}
The mutexes will not enforce any ordering between D and B or C.
To make such ordering, you need to run D later (after B and C joined), or make D wait on a condition, which is set, when B an C are done.
mutex1 and mutex2 are actually not needed, if you implement proper waiting. You need no mutexes if you start D later, or you need only one, if you'll use waiting on a condition variable
Also it is not clear why you need a loop in D. You have just a sum of two variables.
For real application, parallel array processing is usually done by partitioning array by ranges, not by modulo. It does not make sense at all for small sizes like 20. You'd prefer threadpool threads to avoid thread startup overhead and control count of threads. And sure for just evenSum+oddSum you wouldn't need a thread.

why is this c code causing a race condition?

I'm trying to count the number of prime numbers up to 10 million and I have to do it using multiple threads using Posix threads(so, that each thread computes a subset of 10 million). However, my code is not checking for the condition IsPrime. I'm thinking this is due to a race condition. If it is what can I do to ameliorate this issue?
I've tried using a global integer array with k elements but since k is not defined it won't let me declare that at the file scope.
I'm running my code using gcc -pthread:
/*
Program that spawns off "k" threads
k is read in at command line each thread will compute
a subset of the problem domain(check if the number is prime)
to compile: gcc -pthread lab5_part2.c -o lab5_part2
*/
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <stdlib.h>
typedef int bool;
#define FALSE 0
#define TRUE 1
#define N 10000000 // 10 Million
int k; // global variable k willl hold the number of threads
int primeCount = 0; //it will hold the number of primes.
//returns whether num is prime
bool isPrime(long num) {
long limit = sqrt(num);
for(long i=2; i<=limit; i++) {
if(num % i == 0) {
return FALSE;
}
}
return TRUE;
}
//function to use with threads
void* getPrime(void* input){
//get the thread id
long id = (long) input;
printf("The thread id is: %ld \n", id);
//how many iterations each thread will have to do
int numOfIterations = N/k;
//check the last thread. to make sure is a whole number.
if(id == k-1){
numOfIterations = N - (numOfIterations * id);
}
long startingPoint = (id * numOfIterations);
long endPoint = (id + 1) * numOfIterations;
for(long i = startingPoint; i < endPoint; i +=2){
if(isPrime(i)){
primeCount ++;
}
}
//terminate calling thread.
pthread_exit(NULL);
}
int main(int argc, char** args) {
//get the num of threads from command line
k = atoi(args[1]);
//make sure is working
printf("Number of threads is: %d\n",k );
struct timespec start,end;
//start clock
clock_gettime(CLOCK_REALTIME,&start);
//create an array of threads to run
pthread_t* threads = malloc(k * sizeof(pthread_t));
for(int i = 0; i < k; i++){
pthread_create(&threads[i],NULL,getPrime,(void*)(long)i);
}
//wait for each thread to finish
int retval;
for(int i=0; i < k; i++){
int * result = NULL;
retval = pthread_join(threads[i],(void**)(&result));
}
//get the time time_spent
clock_gettime(CLOCK_REALTIME,&end);
double time_spent = (end.tv_sec - start.tv_sec) +
(end.tv_nsec - start.tv_nsec)/1000000000.0f;
printf("Time tasken: %f seconds\n", time_spent);
printf("%d primes found.\n", primeCount);
}
the current output I am getting: (using the 2 threads)
Number of threads is: 2
Time tasken: 0.038641 seconds
2 primes found.
The counter primeCount is modified by multiple threads, and therefore must be atomic. To fix this using the standard library (which is now supported by POSIX as well), you should #include <stdatomic.h>, declare primeCount as an atomic_int, and increment it with an atomic_fetch_add() or atomic_fetch_add_explicit().
Better yet, if you don’t care about the result until the end, each thread can store its own count in a separate variable, and the main thread can add all the counts together once the threads finish. You will need to create, in the main thread, an atomic counter per thread (so that updates don’t clobber other data in the same cache line), pass each thread a pointer to its output parameter, and then return the partial tally to the main thread through that pointer.
This looks like an exercise that you want to solve yourself, so I won’t write the code for you, but the approach to use would be to declare an array of counters like the array of thread IDs, and pass &counters[i] as the arg parameter of pthread_create() similarly to how you pass &threads[i]. Each thread would need its own counter. At the end of the thread procedure, you would write something like, atomic_store_explicit( (atomic_int*)arg, localTally, memory_order_relaxed );. This should be completely wait-free on all modern architectures.
You might also decide that it’s not worth going to that trouble to avoid a single atomic update per thread, declare primeCount as an atomic_int, and then atomic_fetch_add_explicit( &primeCount, localTally, memory_order_relaxed ); once before the thread procedure terminates.

Synchronization of p_threads using globals

I'm pretty new to C, so I'm not sure where even to start digging about my problem.
I'm trying to port python number-crunching algos to C, and since there's no GIL in C (woohoo), I can change whatever I want in memory from threads, for as long as I make sure there are no races.
I did my homework on mutexes, however, I cannot wrap my head around use of mutexes in case of continuously running threads accessing the same array over and over.
I'm using p_threads in order to split the workload on a big array a[N].
Number crunching algorithm on array a[N] is additive, so I'm splitting it using a_diff[N_THREADS][N] array, writing changes to be applied to a[N] array from each thread to a_diff[N_THREADS][N] and then merging them all together after each step.
I need to run the crunching on different versions of array a[N], so I pass them via global pointer p (in the MWE, there's only one a[N])
I'm synchronizing threads using another global array SYNC_THREADS[N_THREADS] and make sure threads quit when I need them to by setting END_THREADS global (I know, I'm using too many globals - I don't care, code is ~200 lines). My question is in regard to this synchronization technique - is it safe to do so and what is cleaner/better/faster way to achieve that?
MWEe:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define N_THREADS 3
#define N 10000000
#define STEPS 3
double a[N]; // main array
double a_diff[N_THREADS][N]; // diffs array
double params[N]; // parameter used for number-crunching
double (*p)[N]; // pointer to array[N]
// structure for bounds for crunching the array
struct bounds {
int lo;
int hi;
int thread_num;
};
struct bounds B[N_THREADS];
int SYNC_THREADS[N_THREADS]; // for syncing threads
int END_THREADS = 0; // signal to terminate threads
static void *crunching(void *arg) {
// multiple threads run number-crunching operations according to assigned low/high bounds
struct bounds *data = (struct bounds *)arg;
int lo = (*data).lo;
int hi = (*data).hi;
int thread_num = (*data).thread_num;
printf("worker %d started for bounds [%d %d] \n", thread_num, lo, hi);
int i;
while (END_THREADS != 1) { // END_THREADS tells threads to terminate
if (SYNC_THREADS[thread_num] == 1) { // SYNC_THREADS allows threads to start number-crunching
printf("worker %d working... \n", thread_num );
for (i = lo; i <= hi; ++i) {
a_diff[thread_num][i] += (*p)[i] * params[i]; // pretend this is an expensive operation...
}
SYNC_THREADS[thread_num] = 0; // thread disables itself until SYNC_THREADS is back to 1
printf("worker %d stopped... \n", thread_num );
}
}
return 0;
}
int i, j, th,s;
double joiner;
int main() {
// pre-fill arrays
for (i = 0; i < N; ++i) {
a[i] = i + 0.5;
params[i] = 0.0;
}
// split workload between workers
int worker_length = N / N_THREADS;
for (i = 0; i < N_THREADS; ++i) {
B[i].thread_num = i;
B[i].lo = i * worker_length;
if (i == N_THREADS - 1) {
B[i].hi = N;
} else {
B[i].hi = i * worker_length + worker_length - 1;
}
}
// pointer to parameters to be passed to worker
struct bounds **data = malloc(N_THREADS * sizeof(struct bounds*));
for (i = 0; i < N_THREADS; i++) {
data[i] = malloc(sizeof(struct bounds));
data[i]->lo = B[i].lo;
data[i]->hi = B[i].hi;
data[i]->thread_num = B[i].thread_num;
}
// create thread objects
pthread_t threads[N_THREADS];
// disallow threads to crunch numbers
for (th = 0; th < N_THREADS; ++th) {
SYNC_THREADS[th] = 0;
}
// launch workers
for(th = 0; th < N_THREADS; th++) {
pthread_create(&threads[th], NULL, crunching, data[th]);
}
// big loop of iterations
for (s = 0; s < STEPS; ++s) {
for (i = 0; i < N; ++i) {
params[i] += 1.0; // adjust parameters
// zero diff array
for (i = 0; i < N; ++i) {
for (th = 0; th < N_THREADS; ++th) {
a_diff[th][i] = 0.0;
}
}
p = &a; // pointer to array a
// allow threads to process numbers and wait for threads to complete
for (th = 0; th < N_THREADS; ++th) { SYNC_THREADS[th] = 1; }
// ...here threads started by pthread_create do calculations...
for (th = 0; th < N_THREADS; th++) { while (SYNC_THREADS[th] != 0) {} }
// join results from threads (number-crunching is additive)
for (i = 0; i < N; ++i) {
joiner = 0.0;
for (th = 0; th < N_THREADS; ++th) {
joiner += a_diff[th][i];
}
a[i] += joiner;
}
}
}
// join workers
END_THREADS = 1;
for(th = 0; th < N_THREADS; th++) {
pthread_join(threads[th], NULL);
}
return 0;
}
I see that workers don't overlap time-wise:
worker 0 started for bounds [0 3333332]
worker 1 started for bounds [3333333 6666665]
worker 2 started for bounds [6666666 10000000]
worker 0 working...
worker 1 working...
worker 2 working...
worker 2 stopped...
worker 0 stopped...
worker 1 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 0 stopped...
worker 2 stopped...
worker 2 working...
worker 0 working...
worker 1 working...
worker 1 stopped...
worker 2 stopped...
worker 0 stopped...
Process returned 0 (0x0) execution time : 1.505 s
and I make sure worker's don't get into each other workspaces by separating them via a_diff[thead_num][N] sub-arrays, however, I'm not sure that's always the case and that I'm not introducing hidden races somewhere...
I didn't realize what was the question :-)
So, the question is if you're thinking well with your SYNC_THREADS and END_THREADS synchronization mechanism.
Yes!... Almost. The problem is that threads are burning CPU while waiting.
Conditional Variables
To make threads wait for an event you have conditional variables (pthread_cond). These offer a few useful functions like wait(), signal() and broadcast():
wait(&cond, &m) blocks a thread in a given condition variable. [note 2]
signal(&cond) unlocks a thread waiting in a given condition variable.
broadcast(&cond) unlocks all threads waiting in a given condition variable.
Initially you'd have all the threads waiting [note 1]:
while(!start_threads)
pthread_cond_wait(&cond_start);
And, when the main thread is ready:
start_threads = 1;
pthread_cond_broadcast(&cond_start);
Barriers
If you have data dependencies between iterations, you'd want to make sure threads are executing the same iteration at any given moment.
To synchronize threads at the end of each iteration, you'll want to take a look at barriers (pthread_barrier):
pthread_barrier_init(count): initializes a barrier to synchronize count threads.
pthread_barrier_wait(): thread waits here until all count threads reach the barrier.
Extending functionality of barriers
Sometimes you'll want the last thread reaching a barrier to compute something (e.g. to increment the counter of number of iterations, or to compute some global value, or to check if execution should stop). You have two alternatives
Using pthread_barriers
You'll need to essentially have two barriers:
int rc = pthread_barrier_wait(&b);
if(rc != 0 && rc != PTHREAD_BARRIER_SERIAL_THREAD)
if(shouldStop()) stop = 1;
pthread_barrier_wait(&b);
if(stop) return;
Using pthread_conds to implement our own specialized barrier
pthread_mutex_lock(&mutex)
remainingThreads--;
// all threads execute this
executedByAllThreads();
if(remainingThreads == 0) {
// reinitialize barrier
remainingThreads = N;
// only last thread executes this
if(shouldStop()) stop = 1;
pthread_cond_broadcast(&cond);
} else {
while(remainingThreads > 0)
pthread_cond_wait(&cond, &mutex);
}
pthread_mutex_unlock(&mutex);
Note 1: why is pthread_cond_wait() inside a while block? May seem a bit odd. The reason behind it is due to the existence of spurious wakeups. The function may return even if no signal() or broadcast() was issued. So, just to guarantee correctness, it's usual to have an extra variable to guarantee that if a thread suddenly wakes up before it should, it runs back into the pthread_cond_wait().
From the manual:
When using condition variables there is always a Boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.
(...)
If a signal is delivered to a thread waiting for a condition variable, upon return from the signal handler the thread resumes waiting for the condition variable as if it was not interrupted, or it shall return zero due to spurious wakeup.
Note 2:
An noted by Michael Burr in the comments, you should hold a companion lock whenever you modify the predicate (start_threads) and pthread_cond_wait(). pthread_cond_wait() will release the mutex when called; and re-acquires it when it returns.
PS: It's a bit late here; sorry if my text is confusing :-)

Windows Threading API: Calculate PI value with multiple threads

I am currently working on this project where I need to calculate the value of PI...
When specifying only one thread works perfectly and I get 3.1416[...] but when I specify to solve the process in 2 or more threads I stop getting the 3.1416 value, this is my code:
#include <stdio.h>
#include <time.h>
#include <windows.h>
//const int numThreads = 1;
//long long num_steps = 100000000;
const int numThreads = 2;
long long num_steps = 50000000;
double x, step, pi, sum = 0.0;
int i;
DWORD WINAPI ValueFunc(LPVOID arg){
for (i=0; i<=num_steps; i++) {
x = (i + .5)*step;
sum = sum + 4.0 / (1. + x*x);
}
printf("this is %d step\n", i);
return 0;
}
int main(int argc, char* argv[]) {
int count;
clock_t start, stop;
step = 1. / (double)num_steps;
start = clock();
HANDLE hThread[numThreads];
for ( count = 0; count < numThreads; count++) {
printf("This is thread %d\n", count);
hThread[count] = CreateThread(NULL, 0, ValueFunc, NULL, 0, NULL);
}
WaitForMultipleObjects(numThreads, hThread, TRUE, INFINITE);
pi = sum*step;
stop = clock();
printf("The value of PI is %15.12f\n", pi);
printf("The time to calculate PI was %f seconds\n", ((double)(stop - start) / 1000.0));
}
I get this wrong output when specifying 2 threads:
It seems that your program, when using two threads, allows both threads to directly manipulate a global/shared resource 'sum' without any synchronization protection.
In other words, both threads can manipulate 'sum' at the same time. The value of 'sum' at any point will not be what is expected (ie: as it was with only one thread).
Your program needs to implement some sort of access synchronization between the two threads; such as semaphores, spin-locks, mutex, atomic operations, etc. If implemented properly, these features would allow the two (or more) threads to share the single task (of calculating PI).
You need to use mutexes to access data shared by multiple threads or have the data local to the particular thread and then colate the answer when all the threads have completed.

test "pthread_create" function behaviour

I am new in multi-threaded programming and I have a question about "pthread_create" behaviour
this is the code :
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define NTHREADS 10
#define ARRAYSIZE 1000000
#define ITERATIONS ARRAYSIZE / NTHREADS
double sum=0.0, a[ARRAYSIZE];
pthread_mutex_t sum_mutex;
void *do_work(void *tid)
{
int i, k=0,start, *mytid, end;
double mysum=0.0;
mytid = (int *) tid;
start = (*mytid * ITERATIONS);
end = start + ITERATIONS;
printf ("Thread %d doing iterations %d to %d\n",*mytid,start,end-1);
for (i=start; i < end ; i++) {
a[i] = i * 1.0;
mysum = mysum + a[i];
}
sum = sum + mysum;
}
int main(int argc, char *argv[])
{
int i, start, tids[NTHREADS];
pthread_t threads[NTHREADS];
pthread_attr_t attr;
for (i=0; i<NTHREADS; i++) {
tids[i] = i;
pthread_create(&threads[i], NULL/*&attr*/, do_work, (void *) &tids[i]);
}
/* Wait for all threads to complete then print global sum */
/*
for (i=0; i<NTHREADS; i++) {
pthread_join(threads[i], NULL);
}*/
printf ("Done. Sum= %e \n", sum);
sum=0.0;
for (i=0;i<ARRAYSIZE;i++){
a[i] = i*1.0;
sum = sum + a[i]; }
printf("Check Sum= %e\n",sum);
}
the result of execution is :
Thread 1 doing iterations 100000 to 199999
Done. Sum= 0.000000e+00
Thread 0 doing iterations 0 to 99999
Thread 2 doing iterations 200000 to 299999
Thread 3 doing iterations 300000 to 399999
Thread 8 doing iterations 800000 to 899999
Thread 4 doing iterations 400000 to 499999
Thread 5 doing iterations 500000 to 599999
Thread 9 doing iterations 900000 to 999999
Thread 7 doing iterations 700000 to 799999
Thread 6 doing iterations 600000 to 699999
Check Sum= 8.299952e+11
all thread are created and the execution is not sequential (remove pthread_join), but the function do_work is executed in order and depend on thread. it means iterations 0 to 99999 are done by thread 0 and iterations 100000 to 199999 are done by thread 1 etc ...
the question is here why for example iterations 0 to 99999 is not done by thread 2 ?
This is because iteration range is calculated based on thread number from 0 to N in the following line:
start = (*mytid * ITERATIONS);
And you create and pass that number in a loop as such:
for (i=0; i<NTHREADS; i++) {
tids[i] = i;
...
In other words, 2 + N will never be 0 to perform iteration over 0 to 99999 when N is non-negative.
I think you are confused as to what threads are.
Think about each thread as its own program. If you run 10 programs at the same time, they will be running "simultaneously", i.e. instructions of these 10 programs will be interleaved, but within each program all instructions are executed in the deterministic order.
Same thing with threads. You define which numbers each thread will iterate over by passing the thread id argument when creating the thread.
You are sending the address of tids[i]th variable to thread i and printing based on it. In this case Nth thread will always print from N000000 till N999999 without any change in all the trial runs.
mytid = (int *) tid;
start = (*mytid * ITERATIONS);
end = start + ITERATIONS;
For thread 2, it would behave as,
mytid = (int *) tid; // *mytid = 2
start = ( 2 * ITERATIONS); // 2000000
end = ( 2 * ITERATIONS) + ITERATIONS; // 2999999
Thus printing 2000000 to 2999999. So you can't expect thread 2 to print 0 to 99999.
It is not wise to use global variables like sum, which are shared between threads, without any kind of locking mechanisms. They would bring unexpected results. Use of pthread_mutex here would solve this problem.

Resources