I'm new to concurrent programming so be nice. I have a basic sequential program (which is for homework) and I'm attempting to turn it into a multithreaded program. I'm not sure if I need a lock for my second shared variable. The threads should modify my variable but never read them. The only time count should be read is after the loop which spawns all of my threads has finished distributing keys.
#define ARRAYSIZE 50000
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
void binary_search(int *array, int key, int min, int max);
int count = 0; // count of intersections
int l_array[ARRAYSIZE * 2]; //array to check for intersection
int main(void)
{
int r_array[ARRAYSIZE]; //array of keys
int ix = 0;
struct timeval start, stop;
double elapsed;
for(ix = 0; ix < ARRAYSIZE; ix++)
{
r_array[ix] = ix;
}
for(ix = 0; ix < ARRAYSIZE * 2; ix++)
{
l_array[ix] = ix + 500;
}
gettimeofday(&start, NULL);
for(ix = 0; ix < ARRAYSIZE; ix++)
{
//this is where I will spawn off separate threads
binary_search(l_array, r_array[ix], 0, ARRAYSIZE * 2);
}
//wait for all threads to finish computation, then proceed.
fprintf(stderr, "%d\n", count);
gettimeofday(&stop, NULL);
elapsed = ((stop.tv_sec - start.tv_sec) * 1000000+(stop.tv_usec-start.tv_usec))/1000000.0;
printf("time taken is %f seconds\n", elapsed);
return 0;
}
void binary_search(int *array, int key, int min, int max)
{
int mid = 0;
if (max < min) return;
else
{
mid = (min + max) / 2;
if (array[mid] > key) return binary_search(array, key, min, mid - 1);
else if (array[mid] < key) return binary_search(array, key, mid + 1, max);
else
{
//this is where I'm not sure if I need a lock or not
count++;
return;
}
}
}
As you suspect, count++; requires synchronization. This is actually not something you should try to "get away with" not doing. Sooner or later a second thread will read count after the first thread reads it but before it increments it. Then you will miss a count. It is impossible to predict how often it will happen. It could happen once in a blue moon or thousands of times a second.
Actually, the code as you've written it does both read and modify the variable. If you were to look at the machine code that gets generated for a line like
count++
you'd see that it consists of something like
fetch count into register
increment register
store count
So yes, you should use a mutex there. (And even if you could get away without doing so, why not take the chance to practice?)
If you simply want accurate increments to count across multiple threads, these types of single-value updates are precisely what the interlocked memory-barrier functions are for.
For this I would use :__sync_add_and_fetch if you're using gcc. There a host of different interlocked operations you can do, most of them platform-specific, so check your documentation. For updating counters like this, however, they can save a heap-ton of hassle. Other samples include InterlockedIncrement under Windows, OSAtomicIncrement32 on OS X, etc.
Related
I have a question with MultiThread.
This code is simple Example about comparing Single Thread vs MultiThread.
(sum 0~400,000,000 with singlethread vs 4-multiThread)
//Single
#include<pthread.h>
#include<unistd.h>
#include<stdio.h>
#include<stdlib.h>
#define NUM_THREAD 4
#define MY_NUM 100000000
void* calcThread(void* param);
double total = 0;
double sum[NUM_THREAD] = { 0, };
int main() {
long p[NUM_THREAD] = {MY_NUM, MY_NUM * 2,MY_NUM * 3,MY_NUM * 4 };
int i;
long total_nstime;
struct timespec begin, end;
pthread_t tid[NUM_THREAD];
pthread_attr_t attr[NUM_THREAD];
clock_gettime(CLOCK_MONOTONIC, &begin);
for (i = 0; i < NUM_THREAD; i++) {
calcThread((void*)p[i]);
}
for (i = 0; i < NUM_THREAD; i++) {
total += sum[i];
}
clock_gettime(CLOCK_MONOTONIC, &end);
printf("total = %lf\n", total);
total_nstime = (end.tv_sec - begin.tv_sec) * 1000000000 + (end.tv_nsec - begin.tv_nsec);
printf("%.3fs\n", (float)total_nstime / 1000000000);
return 0;
}
void* calcThread(void* param) {
int i;
long to = (long)(param);
int from = to - MY_NUM + 1;
int th_num = from / MY_NUM;
for (i = from; i <= to; i++)
sum[th_num] += i;
}
I wanna change using 4-MultiThread Code, so I changed that calculate function to using MultiThread.
...
int main() {
...
//createThread
for (i = 0; i < NUM_THREAD; i++) {
pthread_attr_init(&attr[i]);
pthread_create(&tid[i],&attr[i],calcThread,(void *)p[i]);
}
//wait
for(i=0;i<NUM_THREAD;i++){
pthread_join(tid[i],NULL);
}
for (i = 0; i < NUM_THREAD; i++) {
total += sum[i];
}
clock_gettime(CLOCK_MONOTONIC, &end);
...
}
Result(in Ubuntu)
But,It's slower than Single Function Code. I know MultiThread is faster.
I have no idea with this problem :( What's wrong?
Could you give me some advice ? Thanks a lot!
"I know MultiThread is faster"
This isn't always the case, as generally you would be CPU bound in some way, whether that be due to core count, how it is scheduled at the OS level, and hardware level.
It is a balance how many threads is worth giving to a process, as you may run into an old Linux problem where you would be spending more time scheduling the processes than actually running them.
As this is very hardware and OS dependant, it is difficult to say exactly what the issue may be, but make sure you have the appropriate microcode for your CPU installed (generally installed by default in Ubuntu), but just in case, try:
sudo apt-get install intel-microcode
Otherwise look at what other processes are being run, and it may be that a lot of other things are running on the cores that are being allocated the process.
I'm trying to count the number of prime numbers up to 10 million and I have to do it using multiple threads using Posix threads(so, that each thread computes a subset of 10 million). However, my code is not checking for the condition IsPrime. I'm thinking this is due to a race condition. If it is what can I do to ameliorate this issue?
I've tried using a global integer array with k elements but since k is not defined it won't let me declare that at the file scope.
I'm running my code using gcc -pthread:
/*
Program that spawns off "k" threads
k is read in at command line each thread will compute
a subset of the problem domain(check if the number is prime)
to compile: gcc -pthread lab5_part2.c -o lab5_part2
*/
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <stdlib.h>
typedef int bool;
#define FALSE 0
#define TRUE 1
#define N 10000000 // 10 Million
int k; // global variable k willl hold the number of threads
int primeCount = 0; //it will hold the number of primes.
//returns whether num is prime
bool isPrime(long num) {
long limit = sqrt(num);
for(long i=2; i<=limit; i++) {
if(num % i == 0) {
return FALSE;
}
}
return TRUE;
}
//function to use with threads
void* getPrime(void* input){
//get the thread id
long id = (long) input;
printf("The thread id is: %ld \n", id);
//how many iterations each thread will have to do
int numOfIterations = N/k;
//check the last thread. to make sure is a whole number.
if(id == k-1){
numOfIterations = N - (numOfIterations * id);
}
long startingPoint = (id * numOfIterations);
long endPoint = (id + 1) * numOfIterations;
for(long i = startingPoint; i < endPoint; i +=2){
if(isPrime(i)){
primeCount ++;
}
}
//terminate calling thread.
pthread_exit(NULL);
}
int main(int argc, char** args) {
//get the num of threads from command line
k = atoi(args[1]);
//make sure is working
printf("Number of threads is: %d\n",k );
struct timespec start,end;
//start clock
clock_gettime(CLOCK_REALTIME,&start);
//create an array of threads to run
pthread_t* threads = malloc(k * sizeof(pthread_t));
for(int i = 0; i < k; i++){
pthread_create(&threads[i],NULL,getPrime,(void*)(long)i);
}
//wait for each thread to finish
int retval;
for(int i=0; i < k; i++){
int * result = NULL;
retval = pthread_join(threads[i],(void**)(&result));
}
//get the time time_spent
clock_gettime(CLOCK_REALTIME,&end);
double time_spent = (end.tv_sec - start.tv_sec) +
(end.tv_nsec - start.tv_nsec)/1000000000.0f;
printf("Time tasken: %f seconds\n", time_spent);
printf("%d primes found.\n", primeCount);
}
the current output I am getting: (using the 2 threads)
Number of threads is: 2
Time tasken: 0.038641 seconds
2 primes found.
The counter primeCount is modified by multiple threads, and therefore must be atomic. To fix this using the standard library (which is now supported by POSIX as well), you should #include <stdatomic.h>, declare primeCount as an atomic_int, and increment it with an atomic_fetch_add() or atomic_fetch_add_explicit().
Better yet, if you don’t care about the result until the end, each thread can store its own count in a separate variable, and the main thread can add all the counts together once the threads finish. You will need to create, in the main thread, an atomic counter per thread (so that updates don’t clobber other data in the same cache line), pass each thread a pointer to its output parameter, and then return the partial tally to the main thread through that pointer.
This looks like an exercise that you want to solve yourself, so I won’t write the code for you, but the approach to use would be to declare an array of counters like the array of thread IDs, and pass &counters[i] as the arg parameter of pthread_create() similarly to how you pass &threads[i]. Each thread would need its own counter. At the end of the thread procedure, you would write something like, atomic_store_explicit( (atomic_int*)arg, localTally, memory_order_relaxed );. This should be completely wait-free on all modern architectures.
You might also decide that it’s not worth going to that trouble to avoid a single atomic update per thread, declare primeCount as an atomic_int, and then atomic_fetch_add_explicit( &primeCount, localTally, memory_order_relaxed ); once before the thread procedure terminates.
i am trying to simulate the Ising Model 1-D. This model consists in a chain of spin (100 spins) and using the Mont Carlo - Metropolis to accept the flip of a spin if the energy of the system (unitary) goes down or if it will be less than a random number.
In the correct program, both the energy the magnetization go to zero, and we have the results as a Gaussian (graphics of Energyor the magnetization by the number of Monte Carlo steps).
I have done some work but i think my random generator isn't correctt for this, and i don't know how/where to implement the boundary conditions: the last spin of the chain is the first one.
I need help to finish it. Any help will be welcome. Thank you.
I am pasting my C program down:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h> //necessary for function time()
#define LENGTH 100 //size of the chain of spins
#define TEMP 2 // Temperature in units of J
#define WARM 200 // Termalização
#define MCS 20000 //Monte Carlo Steps
void start( int spin[])
{
/* starts with all the spins 1 */
int i;
for (i = 0 ; i < 100; i++)
{
spin[i] = 1;
}
}
double energy( int spin[]) //of the change function J=1
{
int i;
double energyX=0;// because the begining Energy = -J*sum (until 100) =-100,
for (i = 0;i<100;i++)
energyX=energyX-spin[i]*spin[i+1];
return(energyX);
}
int randnum(){
int num;
srand(time(NULL));
/* srand(time(NULL)) objectives to initiate the random number generator
with the value of the function time(NULL). This is calculated as being the
total of seconds passed since january first of 1970 until the present date.
So, this way, for each execution the value of the "seed" will be different.
*/
srand(time(NULL));
//picking one spin randomly zero to 100
num=rand() % 100;
printf("num = %d ", num);
return num;
}
void montcarlo( int spin[])
{
int i,j,num;
double prob;
double energyA, energyB; // A -> old energy and B -> the new energy
int rnum1,rnum2;
prob=exp(-(energyB-energyA)/TEMP);
energyA = 0;
energyB = 0;
for (i = 0;i<100;i++)
{
for (j = 0;j<100;j++)
{
energyA=energy(spin);
rnum1=randnum();
rnum2=randnum(); // i think they will give me different numbers
spin[rnum1] = -spin[rnum1]; //flip of the randomly selected spin
energyB = energyB-spin[j]*spin[j+1];
if ((energyB-energyA<0)||((energyB-energyA>0)&&(rnum2>prob))){ // using rnum2 not to be correlated if i used rnum1
spin[rnum1]=spin[rnum1];} // keep the flip
else if((energyB-energyA>0)&&(rnum2<prob))
spin[rnum1]=-spin[rnum1]; // unflip
}
}
}
int Mag_Moment( int spin[] ) // isso é momento magnetico
{
int i;
int mag;
for (i = 0 ; i < 100; i++)
{
mag = mag + spin[i];
}
return(mag);
}
int main()
{
// starting the spin's chain
int spin[100];//the vector goes til LENGHT=100
int i,num,j;
int itime;
double mag_moment;
start(spin);
double energy_chain=0;
energy_chain=energy(spin); // that will give me -100 in the begining
printf("energy_chain starts with %f", energy_chain);// initially it gives -100
/*Warming it makes the spins not so ordered*/
for (i = 1 ; i <= WARM; i++)
{
itime = i;
montcarlo(spin);
}
printf("Configurtion after warming %d \n", itime);
for (j = 0 ; j < LENGTH; j++)
{
printf("%d",spin[j]);
}
printf("\n");
energy_chain=energy(spin); // new energy after the warming
/*openning a file to save the values of energy and magnet moment of the chain*/
FILE *fp; // declaring the file for the energy
FILE *fp2;// declaring the file for the mag moment
fp=fopen("energy_chain.txt","w");
fp2=fopen("mag_moment.txt","w");
int pures;// net value of i
int a;
/* using Monte Carlo metropolis for the whole chain */
for (i = (WARM + 1) ; i <= MCS; i++)
{
itime=i;//saving the i step for the final printf.
pures = i-(WARM+1);
montcarlo(spin);
energy_chain = energy_chain + energy(spin);// the spin chain is moodified by void montcarlo
mag_moment = mag_moment + Mag_Moment(spin);
a=pures%10000;// here i select a value to save in a txt file for 10000 steps to produce graphs
if (a==0){
fprintf(fp,"%.12f\n",energy_chain); // %.12f just to give a great precision
fprintf(fp2,"%.12f\n",mag_moment);
}
}
fclose(fp); // closing the files
fclose(fp2);
/* Finishing -- Printing */
printf("energy_chain = %.12f\n", energy_chain);
printf("mag_moment = %.12f \n", mag_moment);
printf("Temperature = %d,\n Size of the system = 100 \n", TEMP);
printf("Warm steps = %d, Montcarlo steps = %d \n", WARM , MCS);
printf("Configuration in time %d \n", itime);
for (j = 0 ; j < 100; j++)
{
printf("%d",spin[j]);
}
printf("\n");
return 0;
}
you should call srand(time(NULL)); only once in your program. Every time you call this in the same second you will get the same sequence of random numbers. So it is very likely that both calls to randnum will give you the same number.
Just add srand(time(NULL)); at the begin of main and remove it elsewhere.
I see a number of bugs in this code, I think. The first one is the re-seeding of the srand() each loop which has already been addressed. Many of the loops go beyond the array bounds, such as:
for (ii = 0;ii<100;ii++)
{
energyX = energyX - spin[ii]*spin[ii+1];
}
This will give you spin[99]*spin[100] for the last loop, for which is out of bounds. That is kind of peppered throughout the code. Also, I noticed the probability rnum2 is an int but compared as if it's supposed to be a double. I think dividing the rnum2 by 100 will give a reasonable probability.
rnum2 = (randnum()/100.0); // i think they will give me different numbers
The initial probability used to calculate the spin is, prob=exp(-(energyB-energyA)/TEMP); but both energy values are not initialized, maybe this is intentional, but I think it would be better to just use rand(). The Mag_Moment() function never initializes the return value, so you wind up with a return value that is garbage. Can you point me to the algorithm you are trying to reproduce? I'm just curious.
I am currently working on this project where I need to calculate the value of PI...
When specifying only one thread works perfectly and I get 3.1416[...] but when I specify to solve the process in 2 or more threads I stop getting the 3.1416 value, this is my code:
#include <stdio.h>
#include <time.h>
#include <windows.h>
//const int numThreads = 1;
//long long num_steps = 100000000;
const int numThreads = 2;
long long num_steps = 50000000;
double x, step, pi, sum = 0.0;
int i;
DWORD WINAPI ValueFunc(LPVOID arg){
for (i=0; i<=num_steps; i++) {
x = (i + .5)*step;
sum = sum + 4.0 / (1. + x*x);
}
printf("this is %d step\n", i);
return 0;
}
int main(int argc, char* argv[]) {
int count;
clock_t start, stop;
step = 1. / (double)num_steps;
start = clock();
HANDLE hThread[numThreads];
for ( count = 0; count < numThreads; count++) {
printf("This is thread %d\n", count);
hThread[count] = CreateThread(NULL, 0, ValueFunc, NULL, 0, NULL);
}
WaitForMultipleObjects(numThreads, hThread, TRUE, INFINITE);
pi = sum*step;
stop = clock();
printf("The value of PI is %15.12f\n", pi);
printf("The time to calculate PI was %f seconds\n", ((double)(stop - start) / 1000.0));
}
I get this wrong output when specifying 2 threads:
It seems that your program, when using two threads, allows both threads to directly manipulate a global/shared resource 'sum' without any synchronization protection.
In other words, both threads can manipulate 'sum' at the same time. The value of 'sum' at any point will not be what is expected (ie: as it was with only one thread).
Your program needs to implement some sort of access synchronization between the two threads; such as semaphores, spin-locks, mutex, atomic operations, etc. If implemented properly, these features would allow the two (or more) threads to share the single task (of calculating PI).
You need to use mutexes to access data shared by multiple threads or have the data local to the particular thread and then colate the answer when all the threads have completed.
This program mimics the web page counter, counting how many visits to a web page. I just wanna ask what is wrong with this code and why its output is different
the counter value is smaller than the number of visits
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
// repeat 100 times to mimic 100 random visits to the page
#define RPT 100
//web page visit counter int cnt=0;
void* counter() {
int cntLocalCopy;
float r;
cntLocalCopy = cnt;
// mimicking the work of the sever in serving the page to the browser
r = rand() % 2000;
usleep(r);
cnt = cntLocalCopy + 1;
}
int main () {
int i;
float r;
pthread_t tid[RPT];
// seed the random number sequence
srand(time(NULL));
for (i=0; i<RPT; i++) {
// mimicking the random access to the web page
r = rand() % 2000; usleep(r);
// a thread to respond to a connect from a browser
pthread_create (&tid[i], NULL, &counter, NULL);
}
// Wait till threads complete.
for (i=0; i<RPT; i++) {
pthread_join(tid[i], NULL);
}
// print out the counter value and the number of mimicked visits
// the 2 values should be the same if the program is written
// properly
printf ("cnt=%d, repeat=%d\n", cnt, RPT);
}
This isn't a good idea at all:
cntLocalCopy = cnt;
... sleep
cnt = cntLocalCopy + 1;
As the old value of cnt is read before the sleep, the likelihood of 2 or more threads concurrently reading the old value of cnt and then sleeping is very high. Because the sleep duration is random, this might even decrement the counter.
Even if you rearranged the code as follows
... sleep
cntLocalCopy = cnt;
cnt = cntLocalCopy + 1;
or even
++cnt;
A memory barrier will still be needed, as 2 threads could simultaneously read the same old value of cnt, they will both increment it to the same new value, instead of both incrementing the value serially. Have a look here for an example.
As StuartLC, you clearly have a concurrency problem with variable cnt.
Perhaps you should use a mutex or semaphore to create a critical region around this variable so that if a thread is editing / reading in, no thread would be able to write / read it.