Keeping track of all threads in a thread pool - c

I am looking at using the Windows Threading API and the issue it seems to have is you cannot keep track of when all the threads are completed. You can keep track of when the work item has been completed, assuming you kept track of each one. From my research there is no direct way to query the thread pool to see if the work items submitted has all be completed.
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
VOID CALLBACK MyWorkCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Parameter, PTP_WORK Work) {
DWORD threadId = GetCurrentThreadId();
BOOL bRet = FALSE;
printf("%d thread\n", threadId);
return;
}
int main() {
TP_CALLBACK_ENVIRON CallBackEnviron;
PTP_POOL pool = NULL;
PTP_CLEANUP_GROUP cleanupgroup = NULL;
PTP_WORK_CALLBACK workcallback = MyWorkCallback;
PTP_TIMER timer = NULL;
PTP_WORK work = NULL;
InitializeThreadpoolEnvironment(&CallBackEnviron);
pool = CreateThreadpool(NULL);
SetThreadpoolThreadMaximum(pool, 1);
SetThreadpoolThreadMinimum(pool, 3);
SetThreadpoolCallbackPool(&CallBackEnviron, pool);
for (int i = 0; i < 10; ++i) {
work = CreateThreadpoolWork(workcallback, NULL, &CallBackEnviron);
SubmitThreadpoolWork(work);
WaitForThreadpoolWorkCallbacks(work, FALSE); // This waits for the work item to get completed.
}
return 1;
}
Here is a simple example. What happens is on the WaitForThreadpoolWorkCallbacks I am able to wait on that specific work item. Which is no problem if I am doing a few things. However, if I am traversing a directory and have thousands of files that I need to have work done on them, I don't want to keep track of each individual work item. Is it possible to query the Thread Pool queue to see if anything is left for processing? Or to find out if any of the threads are still working?

you need keep track of active tasks ( like pendcnt in comment) +1. but this must not be global variable, but member in some struct. and pass pointer to this struct to work item. increment this counter before call SubmitThreadpoolWork and decrement from callback, before exit. but you also need and event - set this event in signal state, when counter became 0. and wait on event from main thread. if your code inside dll, which can be unloaded - you need also reference dll, before SubmitThreadpoolWork and FreeLibraryWhenCallbackReturns from callback. also important that counter value - was 1 (not 0) ininitally - so this is count_of_active_cb + 1, and decrement it before begin wait (if not do this - counter can became 0 early - for instance first callback exit before you activate second)
class Task
{
HANDLE _hEvent = 0;
ULONG _dwThreadId = 0;
LONG _dwRefCount = 1;
public:
~Task()
{
if (_hEvent) CloseHandle(_hEvent);
}
ULONG Init()
{
if (HANDLE hEvent = CreateEvent(0, 0, 0, 0))
{
_hEvent = hEvent;
return NOERROR;
}
return GetLastError();
}
void AddTask()
{
InterlockedIncrementNoFence(&_dwRefCount);
}
void EndTask()
{
if (!InterlockedDecrement(&_dwRefCount))
{
if (_dwThreadId != GetCurrentThreadId())
{
if (!SetEvent(_hEvent)) __debugbreak();
}
}
}
void Wait()
{
_dwThreadId = GetCurrentThreadId();
EndTask();
if (_dwRefCount && WaitForSingleObject(_hEvent, INFINITE) != WAIT_OBJECT_0) __debugbreak();
}
};
VOID CALLBACK MyWorkCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Parameter, PTP_WORK /*Work*/)
{
// need only if your code in dll which can be unloaded
FreeLibraryWhenCallbackReturns(Instance, (HMODULE)&__ImageBase);
WCHAR sz[32];
swprintf_s(sz, _countof(sz), L"[%x] thread", GetCurrentThreadId());
MessageBoxW(0, 0, sz, MB_ICONINFORMATION);
reinterpret_cast<Task*>(Parameter)->EndTask();
}
void CbDemo()
{
Task task;
if (task.Init() == NOERROR)
{
ULONG n = 2;
do
{
if (PTP_WORK pwk = CreateThreadpoolWork(MyWorkCallback, &task, 0))
{
HMODULE hmod;
// need only if your code in dll which can be unloaded
if (GetModuleHandleExW(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, (PWSTR)&__ImageBase, &hmod))
{
task.AddTask();
SubmitThreadpoolWork(pwk);
}
CloseThreadpoolWork(pwk);
}
} while (--n);
MessageBoxW(0, 0, L"Main Thread", MB_ICONWARNING);
task.Wait();
__nop();
}
}

Related

C: Bus Error between function returns and execution goes back to parent function

To simplify the problem as much as possible, I have two functions, a parent that calls the child. Everything executes okay till it gets to the return of the child function. After that I get a Bus Error.
int main () {
game();
// this doesn't get executed and program fails with bus error
printf("Execute 2");
return 1;
}
int game () {
game_t GameInfo = {.level = 1, .score = 0, .playerCh = 0, .playerX = 1, .playerY = 1};
gameLevel(&GameInfo);
mvprintw(1,1, "Executed");
// code works up to here and get's executed properly
return 1;
};
void gameLevel (game_t *GameInfo) {
// determine the size of the game field
int cellCols = COLS / 3;
int cellRows = (LINES / 3) - 2;
GameInfo -> playerX = 1;
GameInfo -> playerY = 1;
generateMaze(0);
int solved = 0;
int level = GameInfo -> level;
// default player position
getPlayerDefault(GameInfo);
pthread_t enemies_th;
pthread_create(&enemies_th, NULL, enemies, (void *)GameInfo);
// enemies(&level);
while (solved == 0 && GameInfo -> collision != 1) {
printGameInfo(GameInfo);
noecho();
char move = getch();
echo();
if (GameInfo -> collision != 1) {
if (checkMoveValidity(move, GameInfo) == 1) {
solved = movePlayer(move, GameInfo);
if (solved == 1) {
break;
}
}
} else {
break;
}
}
if (solved == 1) {
pthread_cancel(enemies_th);
GameInfo->level++;
gameLevel(GameInfo);
} else {
// game over
pthread_cancel(enemies_th);
return;
}
}
Now, the code is much more complicated than here, but I think that shouldn't have any influence on this (?) as it executes properly, until the return statement. There is also ncurses and multithreading, quite complex custom structures, but it all works, up until that point. Any ideas ?
Tried putting print statements after each segment of code, everything worked up until this.
pthread_cancel() doesn't terminate the requested thread immediately. The only way to know that a cancelled thread has terminated is to call pthread_join(). If the thread is left running, it will interfere with use of the GameInfo variable in the next level of the game if the current level is solved, or may use the GameInfo variable beyond its lifetime if the current level was not solved and the main thread returns back to the main() function.
To make sure the old enemies thread has terminated, add calls to pthread_join() to the gameLevel() function as shown below:
if (solved == 1) {
pthread_cancel(enemies_th);
pthread_join(enemies_th);
GameInfo->level++;
gameLevel(GameInfo);
} else {
// game over
pthread_cancel(enemies_th);
pthread_join(enemies_th);
return;
}
The use of tail recursion in gameLevel() seems unnecessary. I recommend returning the solved value and letting the game() function start the next level:
In game():
while (gameLevel(&GameInfo)) {
GameInfo.level++;
}
In gameLevel():
int gameLevel(game_t *GameInfo) {
/* ... */
pthread_cancel(enemies_th);
pthread_join(enemies_th);
return solved;
}

Using Semaphors to create a thread safe stack in C?

I'm trying to make a stack that I implemented thread safe using semaphors. It works when I push a single object onto the stack, but terminal freezes up as soon as I try to push a second item onto the stack or pop an item off of the stack. This is what I have so far and am not sure where I'm messing up. Everything complies right, but the terminal just freezes as previously stated
Heres where I create the stack
sem_t selements, sspace;
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
BlockingStack *new_BlockingStack(int max_size)
{
sem_init(&selements, 0, 0);
sem_init(&sspace, 0, max_size);
BlockingStack *newBlockingStack = malloc(sizeof(BlockingStack));
newBlockingStack->maxSize = max_size;
newBlockingStack->stackTop = -1;
newBlockingStack->element = malloc(max_size * sizeof(void *));
if (newBlockingStack == NULL)
{
return NULL;
}
if (newBlockingStack->element == NULL)
{
free(newBlockingStack);
return NULL;
}
return newBlockingStack;
}
And here are the Push and Pop:
bool BlockingStack_push(BlockingStack *this, void *element)
{
sem_wait(&sspace);
pthread_mutex_lock(&m);
if (this->stackTop == this->maxSize - 1)
{
return false;
}
if (element == NULL)
{
return false;
}
this->element[++this->stackTop] = element;
return true;
pthread_mutex_unlock(&m);
sem_post(&selements);
}
void *BlockingStack_pop(BlockingStack *this)
{
sem_wait(&selements);
pthread_mutex_lock(&m);
if (this->stackTop == -1)
{
return NULL;
}
else
{
return this->element[this->stackTop--];
}
pthread_mutex_unlock(&m);
sem_post(&sspace);
}
SUGGESTED CHANGES:
sem_t sem;
...
BlockingStack *new_BlockingStack(int max_size)
{
sem_init(&sem, 0, 1);
...
bool BlockingStack_push(BlockingStack *this, void *element)
{
sem_wait(&sem);
...
sem_post(&sem);
...
Specifically:
I would only initialize one semaphore object unless I was SURE I needed others
I would use the same semaphore for push() and pop()
pshared: 0 should be sufficent for synchronizing different pthreads inside your single process.
Initialize the semaphore to 1, because the first thing you'll do for either "push" or "pop" is sem_wait().
For thread safety you already have mutex used (pthread_mutex_lock(&m) and pthread_mutex_unlock(&m)). Using such mutual exclusion is enough for that purpose. Once one thread obtains the mutex, other thread blocks on pthread_mutex_lock(&m) call.
And only the thread currently obtaining the mutex can call pthread_mutex_unlock(&m).
OK, i was working on this and finally cracked the answer after doing a little bit of internet research and debugging my code. The error was the the mutex_unlock and the sem_post had to come before the return.
Take my pop for example:
void *BlockingStack_pop(BlockingStack *this)
{
sem_wait(&selements);
pthread_mutex_lock(&m);
if (this->stackTop == -1)
{
return NULL;
}
else
{
return this->element[this->stackTop--];
}
pthread_mutex_unlock(&m);
sem_post(&sspace);
}
notice how the pthread_mutex_unlock(&m); and the sem_post(&sspace); come after the return, they actually must be placed before every return like so:
void *BlockingStack_pop(BlockingStack *this)
{
...
pthread_mutex_unlock(&m);
sem_post(&sspace);
return NULL;
...
pthread_mutex_unlock(&m);
sem_post(&sspace);
return this->element[this->stackTop--];
...
}

Understanding Glib polling system for file descriptors

I'm trying to understand glib polling system. As I understand, polling is a technique to watch file descriptors for events. The function os_host_main_loop_wait runs in a loop. You can see that it calls glib_pollfds_fill, qemu_poll_ns and glib_pollfds_poll. I'm trying to understand what this loop does by calling each of these functions.
static GArray *gpollfds;
static void glib_pollfds_fill(int64_t *cur_timeout)
{
GMainContext *context = g_main_context_default();
int timeout = 0;
int64_t timeout_ns;
int n;
g_main_context_prepare(context, &max_priority);
glib_pollfds_idx = gpollfds->len;
n = glib_n_poll_fds;
do {
GPollFD *pfds;
glib_n_poll_fds = n;
g_array_set_size(gpollfds, glib_pollfds_idx + glib_n_poll_fds);
//Gets current index's address on gpollfds array
pfds = &g_array_index(gpollfds, GPollFD, glib_pollfds_idx);
//Fills gpollfds's each element (pfds) with the file descriptor to be polled
n = g_main_context_query(context, max_priority, &timeout, pfds,
glib_n_poll_fds);
//g_main_context_query returns the number of records actually stored in fds , or,
//if more than n_fds records need to be stored, the number of records that need to be stored.
} while (n != glib_n_poll_fds);
if (timeout < 0) {
timeout_ns = -1;
} else {
timeout_ns = (int64_t)timeout * (int64_t)SCALE_MS;
}
*cur_timeout = qemu_soonest_timeout(timeout_ns, *cur_timeout);
}
static void glib_pollfds_poll(void)
{
GMainContext *context = g_main_context_default();
GPollFD *pfds = &g_array_index(gpollfds, GPollFD, glib_pollfds_idx);
if (g_main_context_check(context, max_priority, pfds, glib_n_poll_fds)) {
g_main_context_dispatch(context);
}
}
static int os_host_main_loop_wait(int64_t timeout)
{
GMainContext *context = g_main_context_default();
int ret;
g_main_context_acquire(context);
glib_pollfds_fill(&timeout);
qemu_mutex_unlock_iothread();
replay_mutex_unlock();
ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout); //RESOLVES TO: g_poll(fds, nfds, qemu_timeout_ns_to_ms(timeout));
replay_mutex_lock();
qemu_mutex_lock_iothread();
glib_pollfds_poll();
g_main_context_release(context);
return ret;
}
So, as I understand, g_poll polls the array of file descriptors with a timeout. What it means? It means it waits for the timeout. If something happens (there's data in the fd to be read for example), I don't know what it does.
Then glib_pollfds_poll calls g_main_context_check and then g_main_context_dispatch.
According to glib's documentation, what g_main_context_check does is:
Passes the results of polling back to the main loop.
What that means?
Then g_main_context_dispatch
dispatches all sources
, which I also don't know what it means.
Entire source can be founde here: https://github.com/qemu/qemu/blob/14e5526b51910efd62cd31cd95b49baca975c83f/util/main-loop.c

Linux kernel windows-like event implementation

We need to implement Windows-like kernel event for Linux. These functions are intended to behave as corresponding KeInitializeEvent, KeSetEvent, KeResetEvent, KePulseEvent, and KeWaitForSingleObject from Windows kernel. Synchronization event is called auto reset here, and Notification event is called manual reset. Here is the code:
Event.h
#define WAIT_FOREVER -1
#define event_init(event, manual_reset, initial_state) __event_init(event, manual_reset, initial_state)
#define event_set(event) __event_set(event)
#define event_reset(event) __event_reset(event)
#define event_pulse(event) __event_pulse(event)
#define event_wait(event, ms_timeout) __event_wait(event, (ms_timeout == WAIT_FOREVER) ? (WAIT_FOREVER) : ((ms_timeout * HZ) / 1000))
typedef struct _wait_t
{
atomic_t b;
wait_queue_head_t q;
struct list_head list;
} wait_t;
typedef struct _event_t
{
struct list_head Wait;
bool AutoReset;
bool State;
} event_t;
void __event_init_lib(void);
void __event_init(event_t *event, bool manual_reset, bool initial_state);
bool __event_set(event_t *event);
bool __event_reset(event_t *event);
bool __event_pulse(event_t *event);
status_t __event_wait(event_t *event, time_t timeout);
Event.c
wait_t g_stor[100];
spinlock_t g_lock;
void __event_init_lib(void)
{
wait_t *ptr;
for (int i = 0; i < ARRAY_SIZE(g_stor); ++i)
{
ptr = &g_stor[i];
atomic_set(&ptr->b, 2);
init_waitqueue_head(&ptr->q);
INIT_LIST_HEAD(&ptr->list);
}
spin_lock_init(&g_lock);
}
void __event_init(event_t *event, bool manual_reset, bool initial_state)
{
INIT_LIST_HEAD(&event->Wait);
event->State = initial_state;
event->AutoReset = !manual_reset;
}
status_t __event_wait(event_t *event, time_t timeout)
{
bool b;
wait_t *ptr;
status_t status;
spin_lock(&g_lock);
if (event->State)
{
if (event->AutoReset) event->State = false;
spin_unlock(&g_lock);
return s_success;
}
for (int i = 0; i < ARRAY_SIZE(g_stor); ++i)
{
ptr = &g_stor[i];
if (atomic_cmpxchg(&ptr->b, 2, 0) == 2) break;
}
list_add_tail(&ptr->list, &event->Wait); // note: we need to insert in the end of the list
spin_unlock(&g_lock);
if (timeout == WAIT_FOREVER) wait_event(ptr->q, b = (atomic_cmpxchg(&ptr->b, 1, 2) == 1));
else wait_event_timeout(ptr->q, b = (atomic_cmpxchg(&ptr->b, 1, 2) == 1), timeout);
if (b) status = s_success;
else status = s_timeout;
return status;
}
bool __event_set(event_t *event)
{
bool PrevState;
struct list_head *entry;
wait_t *Wait;
//if (!event->AutoReset && event->State) return true;
spin_lock(&g_lock);
PrevState = event->State;
event->State = true;
if (!PrevState && !list_empty(&event->Wait)) // check if we became signaled right now
// and we have waiters
{
if (event->AutoReset)
{
entry = event->Wait.next;
Wait = container_of(entry, wait_t, list);
atomic_set(&Wait->b, 1);
wake_up(&(Wait->q));
event->State = false;
list_del(entry);
}
else
{
entry = event->Wait.next;
while (entry != &event->Wait)
{
Wait = container_of(entry, wait_t, list);
atomic_set(&Wait->b, 1);
wake_up(&(Wait->q));
entry = entry->next;
list_del(entry->prev);
}
}
}
spin_unlock(&g_lock);
return PrevState;
}
bool __event_reset(event_t *event)
{
bool PrevState;
spin_lock(&g_lock);
PrevState = event->State;
event->State = false;
spin_unlock(&g_lock);
return PrevState;
}
bool __event_pulse(event_t *event)
{
bool PrevState;
struct list_head *entry;
wait_t *Wait;
spin_lock(&g_lock);
PrevState = event->State;
if (!PrevState && !list_empty(&event->Wait)) // check if we became signaled right now
// and we have waiters
{
if (event->AutoReset)
{
entry = event->Wait.next;
Wait = container_of(entry, wait_t, list);
atomic_set(&Wait->b, 1);
wake_up(&(Wait->q));
list_del(entry);
}
else
{
entry = event->Wait.next;
while (entry != &event->Wait)
{
Wait = container_of(entry, wait_t, list);
atomic_set(&Wait->b, 1);
wake_up(&(Wait->q));
entry = entry->next;
list_del(entry->prev);
}
}
}
event->State = false;
spin_unlock(&g_lock);
return PrevState;
}
I think each waiting thread needs its own condition variable, because if we have one condition variable and set it to true, new waiter may arrive and pass through wait_event unintentionally without even falling to sleep. So we need to maintain list of condition variables, hence to wake up correct thread we also need multiple wait queues. Also ReactOS source suggest that event maintains the list of waiters.
Since we can't use thread local storage variables in kernel mode (at least in no easy way) I decided to implement array of wait blocks. When we need to insert waiter in the list we loop this array in search for free wait block. This leads me to believe that we need to use single global lock as ReactOS does (dispatcher lock), not separate lock for each event object.
We need event object for our video camera driver ported from Windows. Everything seems to work fine, however frames per second sometimes drops from 14 fps to 10 (and image flickers). It led me to believe that there is something wrong with the implementation of event.
If you have some suggestions, please share. Thank you.

Writing a scheduler for a Userspace thread library

I am developing a userspace premptive thread library(fibre) that uses context switching as the base approach. For this I wrote a scheduler. However, its not performing as expected. Can I have any suggestions for this.
The structure of the thread_t used is :
typedef struct thread_t {
int thr_id;
int thr_usrpri;
int thr_cpupri;
int thr_totalcpu;
ucontext_t thr_context;
void * thr_stack;
int thr_stacksize;
struct thread_t *thr_next;
struct thread_t *thr_prev;
} thread_t;
The scheduling function is as follows:
void schedule(void)
{
thread_t *t1, *t2;
thread_t * newthr = NULL;
int newpri = 127;
struct itimerval tm;
ucontext_t dummy;
sigset_t sigt;
t1 = ready_q;
// Select the thread with higest priority
while (t1 != NULL)
{
if (newpri > t1->thr_usrpri + t1->thr_cpupri)
{
newpri = t1->thr_usrpri + t1->thr_cpupri;
newthr = t1;
}
t1 = t1->thr_next;
}
if (newthr == NULL)
{
if (current_thread == NULL)
{
// No more threads? (stop itimer)
tm.it_interval.tv_usec = 0;
tm.it_interval.tv_sec = 0;
tm.it_value.tv_usec = 0; // ZERO Disable
tm.it_value.tv_sec = 0;
setitimer(ITIMER_PROF, &tm, NULL);
}
return;
}
else
{
// TO DO :: Reenabling of signals must be done.
// Switch to new thread
if (current_thread != NULL)
{
t2 = current_thread;
current_thread = newthr;
timeq = 0;
sigemptyset(&sigt);
sigaddset(&sigt, SIGPROF);
sigprocmask(SIG_UNBLOCK, &sigt, NULL);
swapcontext(&(t2->thr_context), &(current_thread->thr_context));
}
else
{
// No current thread? might be terminated
current_thread = newthr;
timeq = 0;
sigemptyset(&sigt);
sigaddset(&sigt, SIGPROF);
sigprocmask(SIG_UNBLOCK, &sigt, NULL);
swapcontext(&(dummy), &(current_thread->thr_context));
}
}
}
It seems that the "ready_q" (head of the list of ready threads?) never changes, so the search of the higest priority thread always finds the first suitable element. If two threads have the same priority, only the first one has a chance to gain the CPU. There are many algorithms you can use, some are based on a dynamic change of the priority, other ones use a sort of rotation inside the ready queue. In your example you could remove the selected thread from its place in the ready queue and put in at the last place (it's a double linked list, so the operation is trivial and quite inexpensive).
Also, I'd suggest you to consider the performace issues due to the linear search in ready_q, since it may be a problem when the number of threads is big. In that case it may be helpful a more sophisticated structure, with different lists of threads for different levels of priority.
Bye!

Resources