How to use sched_yield() properly?

How to use sched_yield() properly? - c

For an assignment, I need to use sched_yield() to synchronize threads. I understand a mutex lock/conditional variables would be much more effective, but I am not allowed to use those.
The only functions we are allowed to use are sched_yield(), pthread_create(), and pthread_join(). We cannot use mutexes, locks, semaphores, or any type of shared variable.
I know sched_yield() is supposed to relinquish access to the thread so another thread can run. So it should move the thread it executes on to the back of the running queue.
The code below is supposed to print 'abc' in order and then the newline after all three threads have executed. I looped sched_yield() in functions b() and c() because it wasn't working as I expected, but I'm pretty sure all that is doing is delaying the printing because a function is running so many times, not because sched_yield() is working.
The server it needs to run on has 16 CPUs. I saw somewhere that sched_yield() may immediately assign the thread to a new CPU.
Essentially I'm unsure of how, using only sched_yield(), to synchronize these threads given everything I could find and troubleshoot with online.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sched.h>
void* a(void*);
void* b(void*);
void* c(void*);
int main( void ){
pthread_t a_id, b_id, c_id;
pthread_create(&a_id, NULL, a, NULL);
pthread_create(&b_id, NULL, b, NULL);
pthread_create(&c_id, NULL, c, NULL);
pthread_join(a_id, NULL);
pthread_join(b_id, NULL);
pthread_join(c_id, NULL);
printf("\n");
return 0;
}
void* a(void* ret){
printf("a");
return ret;
}
void* b(void* ret){
for(int i = 0; i < 10; i++){
sched_yield();
}
printf("b");
return ret;
}
void* c(void* ret){
for(int i = 0; i < 100; i++){
sched_yield();
}
printf("c");
return ret;
}

There's 4 cases:
a) the scheduler doesn't use multiplexing (e.g. doesn't use "round robin" but uses "highest priority thread that can run does run", or "earliest deadline first", or ...) and sched_yield() does nothing.
b) the scheduler does use multiplexing in theory, but you have more CPUs than threads so the multiplexing doesn't actually happen, and sched_yield() does nothing. Note: With 16 CPUs and 2 threads, this is likely what you'd get for "default scheduling policy" on an OS like Linux - the sched_yield() just does a "Hrm, no other thread I could use this CPU for, so I guess the calling thread can keep using the same CPU!").
c) the scheduler does use multiplexing and there's more threads than CPUs, but to improve performance (avoid task switches) the scheduler designer decided that sched_yield() does nothing.
d) sched_yield() does cause a task switch (yielding the CPU to some other task), but that is not enough to do any kind of synchronization on its own (e.g. you'd need an atomic variable or something for the actual synchronization - maybe like "while( atomic_variable_not_set_by_other_thread ) { sched_yield(); }). Note that with an atomic variable (introduced in C11) it'd work without sched_yield() - the sched_yield() (if it does anything) merely makes busy waiting less awful/wasteful.

Essentially I'm unsure of how, using only sched_yield(), to
synchronize these threads given everything I could find and
troubleshoot with online.
That would be because sched_yield() is not well suited to the task. As I wrote in comments, sched_yield() is about scheduling, not synchronization. There is a relationship between the two, in the sense that synchronization events affect which threads are eligible to run, but that goes in the wrong direction for your needs.
You are probably looking at the problem from the wrong end. You need each of your threads to wait to execute until it is their turn, and for them to do that, they need some mechanism to convey information among them about whose turn it is. There are several alternatives for that, but if "only sched_yield()" is taken to mean that no library functions other than sched_yield() may be used for that purpose then a shared variable seems the expected choice. The starting point should therefore be how you could use a shared variable to make the threads take turns in the appropriate order.
Flawed starting point
Here is a naive approach that might spring immediately to mind:
/* FLAWED */
void *b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
// nothing?
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That is, the thread executes a busy loop, monitoring the shared variable to await it taking a value signifying that the thread should proceed. When it has done its work, the thread modifies the variable to indicate that the next thread may proceed. But there are several problems with that, among them:
Supposing that at least one other thread writes to the object designated by *whose_turn, the program contains a data race, and therefore its behavior is undefined. As a practical matter, a thread that once entered the body of the loop in that function might loop infinitely, notwithstanding any action by other threads.
Without making additional assumptions about thread scheduling, such as a fairness policy, it is not safe to assume that the thread that will make the needed modification to the shared variable will be scheduled in bounded time.
While a thread is executing the loop in that function, it prevents any other thread from executing on the same core, yet it cannot make progress until some other thread takes action. To the extent that we can assume preemptive thread scheduling, this is an efficiency issue and contributory to (2). However, if we assume neither preemptive thread scheduling nor the threads being scheduled each on a separate core then this is an invitation to deadlock.
Possible improvements
The conventional and most appropriate way to do that in a pthreads program is with the use of a mutex and condition variable. Properly implemented, that resolves the data race (issue 1) and it ensures that other threads get a chance to run (issue 3). If that leaves no other threads eligible to run besides the one that will modify the shared variable then it also addresses issue 2, to the extent that the scheduler is assumed to grant any CPU to the process at all.
But you are forbidden to do that, so what else is available? Well, you could make the shared variable _Atomic. That would resolve the data race, and in practice it would likely be sufficient for the wanted thread ordering. In principle, however, it does not resolve issue 3, and as a practical matter, it does not use sched_yield(). Also, all that busy-looping is wasteful.
But wait! You have a clue in that you are told to use sched_yield(). What could that do for you? Suppose you insert a call to sched_yield() in the body of the busy loop:
/* (A bit) better */
void* b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
sched_yield();
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That resolves issues 2 and 3, explicitly affording the possibility for other threads to run and putting the calling thread at the tail of the scheduler's thread list. Formally, it does not resolve issue 1 because sched_yield() has no documented effect on memory ordering, but in practice, I don't think it can be implemented without a (full) memory barrier. If you are allowed to use atomic objects then combining an atomic shared variable with sched_yield() would tick all three boxes. Even then, however, there would still be a bunch of wasteful busy-looping.
Final remarks
Note well that pthread_join() is a synchronization function, thus, as I understand the task, you may not use it to ensure that the main thread's output is printed last.
Note also that I have not spoken to how the main() function would need to be modified to support the approach I have suggested. Changes would be needed for that, and they are left as an exercise.

Related

Is there any synchronization problem in this program

Consider the following program.
(i) What would be the output in Line A and Line B? Justify your answer.
(ii) Do you think there is synchronization problem in updating the variable value?
Justify your answer.
#include <pthread.h>
#include <stdio.h>
#include <sys/types.h>
int value = 100;
void *thread_prog(void *param);
int main(int argc, char *argv[])
{
pthread_t tid;
pthread_create(&tid, NULL, thread_prog, NULL);
pthread_join(tid, NULL);
value = value + 100;
printf("Parent value = %d\n", value); //Line A
}
void *thread_prog(void *param)
{
value = value + 100;
printf("Child value = %d\n", value); // Line B
pthread_exit(0);
}
The output will be Line A is 300 and Line B is 200
I don't think there is a synchronization problem because of the pthread_join(tid, NULL);

In the posted code, the control flow is obvious, so there are no issues with before-after ordering relationships. But proper synchronization of multithreaded code requires more than just establishing proper before-after relationships.
There are two additional aspects that need to be addressed to ensure this code has no synchronization issues.
Are the changes made in the child thread visible in the main thread?
Do the semantics of the C abstract machine preclude a C compiler from assuming that the contents of the variable value does not change while the main thread is running?
This answer only addresses the first concern.
It's not sufficient to merely establish a guaranteed before-after relationship in multithreaded code to ensure a change to a variable is seen in its entirety by another thread. The Wikipedia entry on "memory barrier" provides a good explanation:
A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.
Memory barriers are necessary because most modern CPUs employ performance optimizations that can result in out-of-order execution. This reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but can cause unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent and defined by the architecture's memory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints.
In other words, a change made to a variable running on, for example, CPU 1 may not be "seen" by another thread running on, say, CPU 7 even though the code runs on CPU 7 after the code that changed the variable ran on CPU 1.
There needs to be some sort of platform-specific implementation of a guarantee that such changes are propagated through the actual hardware and visible.
And POSIX's threading model specifies those exact guarantees.
Per POSIX 4.12 Memory Synchronization (bolding mine):
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads:
.
.
.
pthread_create()
pthread_join()
.
.
.
The use of pthread_create() and pthread_join() not only establishes the before-after ordering relationship needed for proper synchronization, but per the POSIX standard they also establish the visibility requirements.
So yes, the posted code is properly synchronized in guaranteed before-after ordering and also visbility aspects.
Note, though, this answer does not address the question of whether or not the posted code is properly synchronized per the semantics of the C abstract machine. I'll defer to experts with better understanding of the C standard and the interpretation of its abstract machine in that regard.

You assume correctly that there is no synchronization problem, pthread_join() suspends the calling thread until the target thread terminates. If the target thread already has terminated, it returns immediately. In any case, the target case is executed and terminated before line A is output.

How to synchronize threads without blocking?

Now as far as I know mutex used for syncing all the thread which are sharing same data by following a principle that when one thread is using that data all other thread should be blocked while using that common resource until it is unlocked...now recently in a blogpost I have seen a code explaining this concept and some people wrote that blocking all the threads while one thread is accessing the resources is a very bad idea and it goes against the concept of threading which is true somehow.. Then my question is how to synchronize threads without blocking?
Here is the link of that blogpost
http://www.thegeekstuff.com/2012/05/c-mutex-examples/

You cannot synchronize threads without blocking by the very definition of synchronization. However, good synchronization technique will limit the scope of where things are blocked to the absolute minimum. To illustrate, and point out exactly why the article is wrong consider the following:
From the article:
pthread_t tid[2];
int counter;
pthread_mutex_t lock;
void* doSomeThing(void *arg)
{
pthread_mutex_lock(&lock);
unsigned long i = 0;
counter += 1;
printf("\n Job %d started\n", counter);
for(i=0; i<(0xFFFFFFFF);i++);
printf("\n Job %d finished\n", counter);
pthread_mutex_unlock(&lock);
return NULL;
}
What it should be:
pthread_t tid[2];
int counter;
pthread_mutex_t lock;
void* doSomeThing(void *arg)
{
unsigned long i = 0;
pthread_mutex_lock(&lock);
counter += 1;
int myJobNumber = counter;
pthread_mutex_unlock(&lock);
printf("\n Job %d started\n", myJobNumber);
for(i=0; i<(0xFFFFFFFF);i++);
printf("\n Job %d finished\n", myJobNumber);
return NULL;
}
Notice that in the article, the work being done (the pointless for loop) is done while holding the lock. This is complete nonsense, since the work is supposed to be done concurrently. The reason the lock is needed is only to protect the counter variable. Thus the threads only need to hold the lock when changing that variable as in the second example.
Mutex locks protect the critical section of code, which are those areas of code which only 1 thread at a time should touch - and all the other threads must block if trying to access the critical section at the same time. However, if thread 1 is in the critical section, and thread 2 is not, then it's perfectly fine for both to run concurrently.

The term you are looking for is lock free data structures.
General idea is that the state shared between threads is contorted into one of those.
Implementations of those vary and often are compiler or platform specific. For example MSVC has a set of _Interlocked* functions to perform simple atomic operations.

blocking all the threads while one thread is accessing the resources is a very bad idea and it goes against the concept of threading which is true somehow
This is a fallacy. Locks block only contending threads, allowing all non-contending threads to run concurrently. Running the work that's the most efficient to run at any particular time rather than forcing any particular ordering is not against the concept of threading at all.
Now if so many of your threads contend so badly that blocking contending threads is harming performance, there are two possibilities:
Most likely you have a very poor design and you should fix it. Don't blame the locks for a high-contention design.
You are in the rare case where other synchronization mechanisms are more appropriate (such as lock-free collections). But this requires significant expertise and analysis of the specific use case to find the best solution.
Generally, if your use case is a perfect fit for atomics, use them. Otherwise, mutexes (possibly in combination with condition variables) should be your first thought. That will cover 99% of the cases a typical multi-threaded C programmer will face.

You can use pthread_mutex_trylock() to attempt a lock. If that fails then you know you would have blocked. You can't do what you want to do, but your thread is not blocked, so it can attempt to do something else. I think most of the comments on that blog are about avoiding contention between threads though, i.e. that maximising multi-threaded performance is about avoiding threads working on the same resource at the same time. If you avoid that by design then by design you don't need locks as you never have contention.

There are a number of tricks that can be used to avoid concurrent bottle necks.
Immutable Data Structures. The idea here is that concurrent reads are okay, but writes are not. To implement something like this you basically need to think of business units as factories to these immutable data structures which are used by other business units.
Asynchronous-Callbacks. This is the essence of event-driven development. If you have concurrent tasks, use the observer pattern to execute some logic when a resource becomes available. Basically we execute some code up until a shared resource is needed then add a listener for when the resource becomes available. This typically results in less readable code and heaver strain on the stack, but you never block a thread waiting on a resource. If you have the tasks ready to keep the CPUs running hot, this pattern will do it for you.
Even with these tools, you'll never completely remove the need for some synchronization (counters come to mind).

General Race Condition

I am new to C and wanted to know about race conditions. I found this on the internet and it asked to find the race condition, and a solution to it.
My analysis is that the race condition is in the create-thread() method has the race condition, specifically in the if-else statement. So when the method is being accessed another thread could be created or removed during the check-and-act and the thread_amt would be off.
In order to not have the race condition, then lock the if-else using mutex, semaphores, etc?
Can anyone correct me if I am wrong, and could possibly show me how to implement mutex?
#define MAXT 255
int threads_amt = 0;
int create-thread() // create a new thread
{
int tid;
if (threads_amt == MAXT) return -1;
else
{
threads_amt++;
return tid;
}
}
void release-thread()
{
/* release thread resources */
--threads_amt;
}

Yeah, the race condition in this case happens because you have no guarantee that the checking and the manipulation of threads_amt are going to happen with no interruption/execution of another thread.
Three solutions off the top of my head:
1) Force mutual exclusion to that part of code using a binary semaphore (or mutex) to protect the if-else part.
2) Use a semaphore with initial value MAXT, and then, upon calling create_thread (mind, you can't use hyphens in function names!), use "wait()" (depending on the type of semaphore, it could have different names (such as sem_wait())). After that, create the thread. When calling release_thread(), simply use "signal()" (sem_post(), when using semaphore.h).
3) This is more of an "hardware" solution: you could assume that you are given an atomic function that performs the entire if-else part, and therefore avoids any race condition problem.
Of these solutions, the "easiest" one (based on the code you already have) is the first one.
Let's use semaphore.h's semaphores:
#define MAXT 255
// Global semaphore
sem_t s;
int threads_amt = 0;
int main () {
...
sem_init (&s, 0, 1); // init semaphore (initial value = 1)
...
}
int create_thread() // create a new thread
{
int tid;
sem_wait(&s);
if (threads_amt == MAXT) {
sem_post(&s); // the semaphore is now available
return -1;
}
else
{
threads_amt++;
sem_post(&s); // the semaphore is now available
return tid;
}
}
void release_thread()
{
/* release thread resources */
sem_wait(&s);
--threads_amt;
sem_post(&s);
}
This should work just fine.
I hope it's clear. If it's not, I suggest that you study how semaphores work (use the web, or buy some Operating System book). Also, you mentioned that you are new to C: IMHO you should start with something easier than this: semaphores aren't exactly the next thing you want to learn after the 'hello world' ;-)

The race condition is not in the if() statements.
It is with access to the variable threads_amt that is potentially changed and accessed at the same time in multiple threads.
Essentially, any thread that modifies the variable must have exclusive access to avoid a race condition. That means all code which modifies the variable or reads its value must be synchronised (e.g. grab a mutex first, release after). Readers don't necessarily need exclusive access (e.g. two threads reading at the same time won't necessarily affect each other) but writers do (so avoid reading a value while trying to change it in another thread) - such considerations can be opportunities to use synchronisation methods other than a mutex - for example, semaphores.
To use a mutex, it is necessary to create it first (e.g. during project startup). Then grab it when needed, and remember to release it when done. Every function should minimise the time that it holds the mutex, since other threads trying to grab the mutex will be forced to wait.
The trick is to make the grabbing and releasing of the mutex unconditional, wherever it occurs (i.e. avoid a function that grabs the mutex, being able to return without releasing it). That depends on how you structure each function.
The actual code for implementing depends on which threading library you're using (so you need to read the documentation) but the concepts are the same. All threading libraries have functions for creating, grabbing (or entering), and releasing mutexes, semaphores, etc etc.

Unpredictable results with two threads calling a function

I hoping someone can help me solve some unpredictable behaviour in a C program I need to fix:
I have two Xenomai real-time tasks (threads) that wait until they receive an incoming message from one of two CAN buses.
Each task calls a function checkMessageNumber() however I'm getting unpredictable results.
Please note that I am using a priority based, single-threaded system. One thread has priority over the other, however one thread could be part-way through executing when the other thread takes priority.
It the future it is possible that the hardware could be upgraded to a multi-threading system, however this part of the program would still be confined to a single thread (one CPU core).
It is my understanding that each thread would invoke it's own instance of this function so I don't know what's happening.
int getMessageIndex(unsigned int msg_number)
{
unsigned int i = 0;
while(i < global_number_message_boxes)
{
if (global_message_box[i].id == msg_number}
return i; // matched the msg number, so return the index number
i++;
}
return -1; // no match found
}
Originally this function was highly unpredictable, and as messages streamed in and were processed by the two tasks (depending on which hardware bus the message came from), this function would sometimes return -1 even though the incoming 'msg_number' did match an 'id' in the 'global_message_box' struct.
I was able to make it work better by setting 'global_number_message_boxes' to an integer:
eg. while(i < 50)
however the function still sometimes returns -1 even though there should be a match.
I am only reading global variables, so why are they getting corrupted? what do I need to learn about this?
My idea is to simplify things so the incoming 'msg_number' simply just is the 'id' in the 'global_message_box'.
Each thread will then write to the struct directly without having to check which 'id' to write too.
How important is it to use a mutex? due to system design, each thread will never write to the same part of the struct, so I am unsure if it's important?
Thanks.

This likely comes down to lack of thread synchronisation around the global struct: you say this function is just reading. Sure, but what if another thread calls another function that writes global_number_message_boxes or global_message_box? In a system where you have globals and multiple threads accessing them the safes rule is: put a lock around every access. Maybe the platform you use even supports read/write locks, so multiple threads can read at the same time as long as none is writing.

Lock and Semaphores will be your friends here. Writing data using two threads is going to cause any number of problems.
When the thread enters the function, you will have to BLOCK the other threads and UNBLOCK those threads at exit. This will ensure thread-safe operations and produce consistent results.

Multi threading and deadlock

I am making a multi-threaded C program which involves the sharing of a global dynamic integer array between two threads. One thread will keep adding elements to it & the other will independently scan the array & free the scanned elements.
can any one suggest me the way how can I do that because what I am doing is creating deadlock
Please also can any one provide the code for it or a way to resolve this deadlock with full explanation

For the threads I would use pthread. Compile it with -pthread.
#include <pthread.h>
int *array;
// return and argument should be `void *` for pthread
void *addfunction(void *p) {
// add to array
}
// same with this thread
void *scanfunction(void *p) {
// scan in array
}
int main(void) {
// pthread_t variable needed for pthread
pthread_t addfunction_t, scanfunction_t; // names are not important but use the same for pthread_create() and pthread_join()
// start the threads
pthread_create(&addfunction_t, NULL, addfunction, NULL); // the third argument is the function you want to call in this case addfunction()
pthread_create(&scanfunction_t, NULL, scanfunction, NULL); // same for scanfunction()
// wait until the threads are finish leave out to continue while threads are running
pthread_join(addfunction_t, NULL);
pthread_join(scanfunction_t, NULL);
// code after pthread_join will executed if threads aren't running anymore
}
Here is a good example/tutorial for pthread: *klick*

In cases like this, you need to look at the frequency and loading generated by each operation on the array. For instance, if the array is being scanned continually, but only added to once an hour, its worth while finding a really slow, latency-ridden write mechanism that eliminates the need for read locks. Locking up every access with a mutex would be very unsatisfactory in such a case.
Without details of the 'scan' operation, especially duration and frequency, it's not possible to suggest a thread communication strategy for good performance.
Anohter thing ee don't know are consequences of failure - it may not matter if a new addition is queued up for a while before actually being inserted, or it may.
If you want a 'Computer Science 101' answer with, quite possibly, very poor performance, lock up every access to the array with a mutex.

http://www.liblfds.org
Release 6 contains a lock-free queue.
Compiles out of the box for Windows and Linux.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight