Busy waiting and shared memory

Busy waiting and shared memory - c

I am currently trying to implement a single C program that creates a shared memory area for a given process then forks this process into one child, makes the child to write into a given position of the shared memory and has the father wait for until the child writes in that position. I used a simple busy waiting approach, suffering the parent process to wait until the child end his writing using a while loop. The problem is that it only works when I introduce some delay in that loop. Anyone has any idea why is this so?
Code:
int shmid;
int *shmptr;
int i, j, ret;
key_t key = SHM_KEY;
// Create shared memory segment
if ((shmid = shmget(key, SHM_SIZE, IPC_CREAT | 0600)) < 0)
{
printf("shmget error: %s\n", strerror(errno));
return -1;
}
// Attach shared memory segment
if ((shmptr = shmat(shmid, 0, 0)) == (void *) -1)
{
puts("shmat error");
return -1;
}
shmptr[6] = '%';
ret = fork();
if (ret > 0)
{/*parent*/
/*here is the loop that implements the busy waiting approach*/
while (shmptr[6] != '^') {
sleep(1);
}
for (i = 0; i < 7; i++) printf("%c", shmptr[i]);
puts("");
int status = 0;
wait(&status);
}
else
{/*child*/
shmptr[0] = 's';
shmptr[1] = 'h';
shmptr[2] = 'a';
shmptr[3] = 'r';
shmptr[4] = 'e';
shmptr[5] = 'd';
/*tell parent process ithas finished its writing*/
shmptr[6] = '^';
exit(0);
}

Volatile (see earlier comment will probably only work in a single-core scenario). Assuming you are running on a CPU with more than one cores, you will need to treat access of every location in the shared memory region atomically. If using a C++11 compliant compiler, each location of the region would need to be assumed to be of type std::atomic<int>.
Since you are probably using C, not C++, and using GCC, consider using the atomic builtins GCC Atomic Builtins.
So, your
shmptr[0] = 's';
statement should be replaced with an atomic set operator:
_sync_val_compare_and_swap(&shmptr[0], 's');
And do the equivalent for all of the sets. Then, do the equivalent in the loop to check for the return value (which will be the character you want).
The semaphore in another answer might work, but, there are no guarantees that the other locations will have made it through the CPUs write-post circuitry, through the cache controller on the source, and so on through the receiving CPU's controller, especially if the addresses being accessed span cache lines.
I would also recommend doing a sleep(0) or yield() of some sort to allow other programs to get time slices on the core that the main program is running on, otherwise, you will waste CPU resources.

You want to synchonise access to the share memory (SHM).
This, for can example, be done by using a semphore.
Before fork()ing off the child call sem_open().
Make the parent wait on sem_wait() prior to reading the SHM.
Have the child call sem_post() when done writing the SHM.

I guess that what is happening is that the child is terminating too quickly.
You might use a non-hanging waitpid(2) and add it in your loop:
/*here is the loop that implements the busy waiting approach*/
int status= 0;
while (shmptr[6] != '^') {
if (waitpid(ret, &status, WNOHANG) == ret) break;
sleep(1);
}
However, as I commented, busy waiting is always bad in Linux user-space programs (at the very least it is stressing your system). Read sem_overview(7), or alternatively, set up a pipe(7) or an eventfd(2) or a signalfd(2) and poll(2) it. Or set up a SIGCHLD signal handler (read carefully signal(7)) which just sets a volatile sigatomic_t flag to be tested in your loop.
You should also declare volatile int*shmptr; because the compiler might have optimized its use.

Related

How to gracefully protect/exit a multi-threaded program in C?

I have a C program where I run 3 branches of code: 2 that i started through pthread_create and the normal one.
I am wondering how to correctly protect it if my second thread fails to be created somehow.
Here is my code:
# include <pthread.h>
# include <stdio.h>
# include <stdlib.h>
# include <semaphore.h>
# include <errno.h>
typedef struct s_philo{
sem_t *first;
sem_t *second;
sem_t *stop_A;
sem_t *stop_B;
pthread_t A_thread;
pthread_t B_thread;
} t_philo;
void sem_close_safe(sem_t *sem)
{
if (sem_close(sem) == -1)
printf("Failed to close semaphore\n");
}
int free_philo(t_philo *philo)
{
if (philo->first)
sem_close_safe(philo->first);
if (philo->second)
sem_close_safe(philo->second);
if (philo->stop_A)
sem_close_safe(philo->stop_A);
if (philo->stop_B)
sem_close_safe(philo->stop_B);
free(philo);
return (1);
}
void *check_philo(t_philo *philo)
{
void *check;
check = philo;
if (!philo->first || !philo->second || !philo->stop_A || !philo->stop_B)
check = NULL;
return (check);
}
sem_t *sem_open_new_safe(const char *name, unsigned int value)
{
sem_t *sem;
sem = sem_open(name, O_CREAT | O_EXCL, 0644, value);
if (errno == EEXIST)
{
if (sem_unlink(name) == -1)
return (NULL);
sem = sem_open(name, O_CREAT | O_EXCL, 0644, value);
}
if (sem == SEM_FAILED)
return (NULL);
if (sem_unlink(name) == -1)
{
sem_close_safe(sem);
return (NULL);
}
return (sem);
}
void *A(void *p)
{
t_philo *philo;
philo = (t_philo *) p;
sem_wait(philo->stop_A);
sem_post(philo->stop_A);
return (NULL);
}
void *B(void *p)
{
t_philo *philo;
philo = (t_philo *) p;
sem_wait(philo->stop_B);
sem_post(philo->stop_B);
return (NULL);
}
int main(void)
{
t_philo *philo;
int i;
philo = malloc(sizeof(*philo));
philo->first = sem_open_new_safe("/first", 1);
philo->second = sem_open_new_safe("/second", 1);
philo->stop_A = sem_open_new_safe("/stop_A", 0);
philo->stop_B = sem_open_new_safe("/stop_B", 0);
if (!check_philo(philo))
return (free_philo(philo));
if (pthread_create(&philo->A_thread, NULL, &A, (void *)philo))
return (free_philo(philo));
if (pthread_create(&philo->B_thread, NULL, &B, (void *)philo))
return (free_philo(philo));
i = 0;
while (i++ < 100)
{
if (sem_wait(philo->first) == -1)
sem_post(philo->stop_B);
if (sem_wait(philo->second) == -1)
sem_post(philo->stop_A);
printf("%d\n", i);
sem_post(philo->second);
sem_post(philo->first);
}
sem_post(philo->stop_B);
sem_post(philo->stop_A);
pthread_join(philo->A_thread, NULL);
pthread_join(philo->B_thread, NULL);
free_philo(philo);
return (0);
}
Both of my A and B threads wait for semaphores on their first lines of code so they will never return on their own if I do not post these semaphores.
Should I pthread_join thread A ? Should I manually post some semaphores to force thread A to continue its execution and return ? Or maybe I should use pthread_detach ? I am a bit lost.
Edit: I have been asked to post more code to make it executable, but I have a lot of lines of code and it would just drown the above one. What I am looking for (if it exists) is not a guided code-specific answer, but more of a best practice to gracefully handle pthread_create errors.
Edit 2: I added the least code I could to make it runnable

The general case looks something like this pseudocode:
if (!setup_A()) {
exit(FAILURE);
}
if (!setup_B()) {
teardown_A();
exit(FAILURE);
}
if (!setup_C()) {
teardown_B();
teardown_A();
exit(FAILURE);
}
do_successful_processing();
teardown_C();
teardown_B();
teardown_A();
and you're effectively asking how to write teardown_B().
The general solution (assuming you can't just switch to C++ to use proper destructors and RAII) does not exist. The teardown is just as specific to the details of A, B, C and your application as the setup is.
I am wondering how to correctly protect it if my second thread fails to be created somehow.
The proximate answer is to tell the thread to quit (in some application-specific way), and then to join it.
The actual semantics of requesting a shutdown are specific to your code, since you wrote the function that thread is executing. Here, it should be sufficient to sem_post the semaphone thread A is waiting on.
NB. DO NOT use pthread_kill to shut down threads, if you can possibly avoid it. It's much better to write clean shutdown handling explicitly.

I am wondering how to correctly protect it if my second thread fails to be created somehow.
Easiest would be to simply call exit(1) in the case that thread creation fails. That will terminate the whole process, including all its threads. All resources owned by the process will be cleaned up by the system.
That does create a possible issue if the program wants to clean up any persistent resources, such as files or named semaphores. Often that's not a major issue, for if the program fails then it may be ok for its termination to be less tidy than if it succeeds. Nevertheless, there are ways to reduce that impact.
In particular, you can minimize the program's use of modifiable persistent resources. For example, your program would be all-around cleaner if it used unnamed semaphores instead of named ones. You would not need to watch out for existing ones when you create them, and since they are being used only within a single process, you would not need to worry about failing to clean them up before terminating.
But where you want some kind of affirmative cleanup to happen when the program terminates, you always have the option to use atexit() to register an exit handler to perform that. There are some caveats, but it's good to be aware of this option.
Should I pthread_join thread A ?
In your particular example, I see no reason to do so. It might be more appropriate in other cases.
Should I manually post some semaphores to force thread A to continue its execution and return ?
If you plan to join thread A then you need to ensure that it will, in fact, terminate. In this case, it looks like yes, you could achieve that by semaphore manipulation, but cases where it actually mattered would be more complex. Ensuring A's timely would probably not be so simple in such cases.
Or maybe I should use pthread_detach ?
There is no advantage at all to doing that in this case. Thread A will be terminated when the process terminates, regardless of whether it is detached. And presumably that's what you want in your example, for it would not make progress if it were the only live thread left in the process.

Named semaphores instead of mutex - readers writers problem without multithreading

My goal is to solve Readers Writers[1] problem but using only isolated processes. One process is for reader one for the writer, I should use named semaphores, so that it is possible to start subsequent reader and writers at any time - also I can't use shared memory - pure synchronization.
More info:
Provide implementation of 2 programs implementing a reader and
a writer, so that it is possible to dynamically start new processes while complying with the restrictions.
Pay attention to the properties of concurrent processing: safety and liveness.
Consider also whether you program is deadlock free.
EDIT: problem is separated to 3 files
File 1. Reader:
int main(){
sem_t *mutex;
sem_t *write;
int count=0;
mutex = sem_open("/mutex", O_CREAT, 0600, 1);
write = sem_open("/write", O_CREAT, 0600, 1);
do{
sem_wait(mutex);
count++;
if (count==1){
sem_wait(write);
}
sem_post(mutex);
printf("Critical section in readers\n");
sem_wait(mutex);
count--;
if(count==0)
sem_post(write);
sem_post(mutex);
}while(1);
}
File 2. Writer
int main(){
sem_t *write;
write = sem_open("/write", O_CREAT, 0600, 1);
do{
sem_wait(write);
printf("Critical section in writer\n");
sem_post(write);
}while(1);
return 0;
}
File 3. Deleting semaphores
int main(){
sem_unlink("/mutex");
sem_unlink("/write");
printf("Semaphores deleted \n");
return 0;
}
Problem:
when I run reader or writer with gcc -pthread file_name.c I don't
get any result, as If the code wasn't doing anything - the process is
running, the cursor is blinking but nothing happens.
[1]: READERS and WRITERS : The reading room has capacity of n
readers. Readers come to the reading room, allocate a single place, and occupy it for some time, then leave. After some time they come again and the procedure repeats. The reading room is also used by writers. However, a writer can only work when the reading room is empty, i.e. there must be no other reader nor writer. The writer occupy the room for some time, then leaves, and comes back after a while

My goal is to solve Readers Writers problem but using only isolated processes. One process is for reader one for the writer, I should use named semaphores, so that it is possible to start subsequent reader and writers at any time - also I can't use shared memory - pure synchronization.
Judging from this limited description, you can probably solve this problem by using named pipes.
I can't use shared memory
The code treats global variables counter and cnt as if they are shared between processes. They are not, each process gets a copy of those with the same value, the changes to these variables are not seen by other processes.
To use functions sem_wait and sem_post link with linker option -pthread.

You mentioned that you have to use "isolated processes", but as far as I know threads are not processes. to create a new process you have to use fork().
Differnces as mentioned here (full link with difference-table):
A process is an active program i.e. a program that is under execution.
It is more than the program code as it includes the program counter,
process stack, registers, program code etc. Compared to this, the
program code is only the text section.
A thread is a lightweight process that can be managed independently by
a scheduler. It improves the application performance using
parallelism. A thread shares information like data segment, code
segment, files etc. with its peer threads while it contains its own
registers, stack, counter etc.
in simple words - each process can have in it multiple threads ("lightweight processes").
I think you have to use fork() to create new Processes because of the word "Process" that you mentioned. also, you mentioned that you need 2 processes (one for the reader and one for the writer) so you have to fork() twice and manage these 2 processes. You can read about fork() here.
edit (semaphore implementation):
int initsem(key_t semkey, int initval)
{
int status = 0, semid;
union semun {/* should to be declared according to C standards */
int val;
struct semid_ds *stat;
ushort *array;
} ctl_arg;
if ((semid = semget(semkey, 1, SEMPERM | IPC_CREAT | IPC_EXCL)) == -1) {
if (errno == EEXIST)
semid = semget(semkey, 1, 0);
}
else { /* if created */
ctl_arg.val = initval; /* set semaphore value to the initial value*/
status = semctl(semid, 0, SETVAL, ctl_arg);
}
if (semid == -1 || status == -1) { /* failure */
perror("initsem failed");
return(-1);
}
else return semid;
}
int sem_wait(int semid)
{
struct sembuf p_buf;
p_buf.sem_num = 0;
p_buf.sem_op = -1;
p_buf.sem_flg = SEM_UNDO;
if (semop(semid, &p_buf, 1) == -1) {
perror("p(semid) failed");
exit(1);
}
else return 0;
}
int sem_post(int semid)
{
struct sembuf v_buf;
v_buf.sem_num = 0;
v_buf.sem_op = 1;
v_buf.sem_flg = SEM_UNDO;
if (semop(semid, &v_buf, 1) == -1) {
perror("v(semid) failed"); exit(1);
}
else return 0;
}

Is implementing semaphore or mutex necessary for simple counter?

I tried implementing programs that calculates sort of integral. And in order to speed up the computation, one creates multiple processes and other uses multiple threads. In my program, each process adds a double value into shared memory and each thread adds a double value through the pointer.
Here's my question. The add operation obviously loads the value from memory, add a value to that, and stores the result to the memory. So it seems my code is quite prone to producer-consumer problem as many processes/threads access the same memory area. However, I couldn't find the case where somebody used semaphores or mutexes to implement a simple accumulator neither.
// creating processes
while (whatever)
{
pid = fork();
if (pid == 0)
{
res = integralproc(clist, m, tmpcnt, tmpleft, tmpright);
*(createshm(shm_key)) += res;
exit(1);
}
}
// creating or retrieving shared memory
long double* createshm(int key)
{
int shm_id = -1;
void* shm_ptr = (void*)-1;
while (shm_id == -1)
{
shm_id = shmget((key_t)key, sizeof(long double), IPC_CREAT | 0777);
}
while (shm_ptr == (void*)-1)
{
shm_ptr = shmat(shm_id, (void*)0, 0);
}
return (long double*)shm_ptr;
}
// creating threads
while (whatever)
{
threadres = pthread_create(&(targs[i]->thread_handle), NULL, integral_thread, (void*)targs[i]);
}
// thread function. targ->resptr is pointer that we add the result to.
void *integral_thread(void *arg)
{
threadarg *targ = (threadarg*)arg;
long double res = integralproc(targ->clist, targ->m, targ->n, targ->left, targ->right);
*(targ->resptr) += res;
//printf("thread %ld calculated %Lf\n", targ->i, res);
pthread_exit(NULL);
}
So I implemented it this way, and so far no matter how many processes/threads I make, the result was as if it never happened.
I'm concerned that my codes may still be potentially dangerous, just barely out of my sight.
Is this code truly safe from any of these problems? Or am I overlooking at something and should the code be revised?

If your threads are all racing to update the same object (ie, the targ->resptr for each thread points at the same thing), then yes - you do have a data race and you can see incorrect results (likely, "lost updates" where two threads that happen to finish at the same time try to update the sum, and only one of them is effective).
You probably haven't seen this because the execution time of your integralproc() function is long, so the chances of multiple threads simultaneously getting to the point of updating *targ->resptr is low.
Nonetheless, you should still fix the problem. You can either add a mutex lock/unlock around the sum update:
pthread_mutex_lock(&result_lock);
*(targ->resptr) += res;
pthread_mutex_unlock(&result_lock);
(This shouldn't affect the efficiency of the solution, since you are only locking and unlocking once in the lifetime of each thread).
Alternatively, you can have each thread record its own partial result in its own thread argument structure:
targ->result = res;
Then, once the worker threads have all been pthread_join()ed the parent thread that created them can just go through all the thread argument structures and add up the partial results.
No extra locking is needed here because the worker threads don't access each others result variable, and the pthread_join() provides the necessary synchronisation between the worker setting the result and the parent thread reading it.

Managing a mutex in shared memory

I'm attempting the simple task of creating a mutex in shared memory. I have the following code to declare a section of shared memory, and attach it to an int*.
int *mutex;
// allocate shared memory for mutex
if ((shmid2 = shmget(IPC_PRIVATE, 4, IPC_CREAT | 0666)) < 0) {
printf("Could not allocate shared memory for mutex: %d.\n", errno);
exit(errno);
}
if ((mutex = shmat(shmid2, NULL, 0)) == (int*)-1) {
printf("Could not attach shared memory for mutex: %d\n", errno);
exit(errno);
}
// set the mutex to one
mutex[0] = 1;
Now, I attempt to define a critical section, surrounded by locking and unlocking the mutex. (Inside of one of many child processes).
while (*mutex == 0) ;
mutex[0] = 0;
// critical section
...
// end critical section
mutex[0] = 1;
However, I'm finding that this technique does not work, and two child processes can enter the critical section simultaneously, without much issue (it happens very often). So I'm wondering what I can do to fix this, without the use of pthreads.

Your options are:
Use POSIX semaphores instead of trying to implement them yourself with shared-memory spinlocks. See the documentation for semop (2) and related functions for details.
If you must use shared-memory semaphores, you will need to use an atomic compare/exchange. Otherwise, two processes can both simultaneously see *mutex == 0 and set it to 1 at the same time, without "noticing" that the other process is doing the same thing.

Is it possible to fork/exec and guarantee one starts before the other?

Pretty much as the title says. I have a snippet of code that looks like this:
pid_t = p;
p = fork();
if (p == 0) {
childfn();
} else if (p > 0) {
parentfn();
} else {
// error
}
I want to ensure that either the parent or the child executes (but not returns from) their respective functions before the other.
Something like a call to sleep() would probably work, but is not guaranteed by any standard, and would just be exploiting an implementation detail of the OS's scheduler...is this possible? Would vfork work?
edit: Both of the functions find their way down to a system() call, one of which will not return until the other is started. So to re-iterate: I need to ensure that either the parent or the child only calls their respective functions (but not returns, cause they won't, which is what all of the mutex based solutions below offer) before the other. Any ideas? Sorry for the lack of clarity.
edit2: Having one process call sched_yield and sleep, I seem to be getting pretty reliable results. vfork does provide the semantics I am looking for, but comes with too many restrictions on what I can do in the child process (I can pretty much only call exec). So, I have found some work-arounds that are good enough, but no real solution. vfork is probably the closest thing to what I was looking for, but all the solutions presented below would work more or less.

This problem would normally be solved by a mutex or a semaphore. For example:
// Get a page of shared memory
int pagesize = getpagesize();
void *mem = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
if(!mem)
{
perror("mmap");
return 1;
}
// Put the semaphore at the start of the shared page. The rest of the page
// is unused.
sem_t *sem = mem;
sem_init(sem, 1, 1);
pid_t p = fork();
if (p == 0) {
sem_wait(sem);
childfn();
sem_post(sem);
} else if (p > 0) {
sem_wait(sem);
parentfn();
sem_post(sem);
int status;
wait(&status);
sem_destroy(sem);
} else {
// error
}
// Clean up
munmap(mem, pagesize);
You could also use a mutex in a shared memory region, but you need to make sure to create with non-default attributes with the process-shared attribute said to shared (via pthread_mutexattr_setpshared(&mutex, PTHREAD_PROCESS_SHARED)) in order for it to work.
This ensures that only one of childfn or parentfn will execute at any given time, but they could run in either order. If you need to have a particular one run first, start the semaphore off with a count of 1 instead of 0, and have the function that needs to run first not wait for the semaphore (but still post to it when finished). You might also be able to use a condition variable, which has different semantics.

A mutex should be able to solve this problem. Lock the mutex before the call to fork and have the 1st function excute as normal, while the second tries to claim the mutex. The 1st should unlock the mutex when it is done and the second will wait until it is free.
EDIT: Mutex must be in a shared memory segment for the two processes

Safest way is to use a (named) pipe or socket. One side writes to it, the other reads. The reader cannot read what has not been written yet.

Use a semphore to ensure that one starts before the other.

You could use an atomic variable. Set it to zero before you fork/thread/exec, have the first process set it to one just before (or better, after) it enters the function, and have the second wait while(flag == 0).