Design of multi-threaded server in c - c

When trying to implement a simple echo server with concurrent support on linux.
Following approaches are used:
Use pthread functions to create a pool of thread, and maintained in a linked list. It's created on process start, and destroy on process termination.
Main thread will accept request, and use a POSIX message queue to store accepted socket file descriptor.
Threads in pool loop to read from message queue, and handle request it gets, when there is no request, it will block.
The program seems working now.
The questions are:
Is it suitable to use message queue in the middle, is it efficient enough?
What is the general approach to accomplish a thread tool that needs to handle concurrent request from multiple clients?
If it's not proper to make threads in pool loop & block to retrieve msg from message queue, then how to deliver requests to threads?

This seems unneccesarily complicated to me. The usual approach for a multithreaded server is:
Create a listen-socket in a thread process
Accept the client-connections in a thread
For each accepted client connection, create a new threads, which receives the corresponding file descriptor and does the work
The worker thread closes the client connection, when it is fully handled
I do not see much benefit in prepopulating a thread-pool here.
If you really want a threadpool:
I would just use a linked list for accepted connections and a pthread_mutex to synchronize access to it:
The listener-process enqueues client fds at the tail of the list.
The clients dequeue it at the head.
If the queue is empty, the thread can wait on a variable (pthread_cond_wait) and are notified by the listener process (pthread_cond_signal) when connections are available.
Another alternative
Depending on the complexity of handling requests, it might be an option to make the server single-threaded, i.e. handle all connections in one thread. This eliminates context-switches altogether and can thus be very performant.
One drawback is, that only one CPU-core is used. To improve that, a hybrid-model can be used:
Create one worker-thread per core.
Each thread handles simultaneously n connections.
You would however have to implement mechanisms to distribute the work fairly amongst the workers.

In addition to using pthread_mutex, you will want to use pthread_cond_t (pthread condition), this will allow you to put the threads in the thread pool to sleep while they are not actually doing work. Otherwise, you will be wasting compute cycles if they are sitting there in a loop checking for something in the work queue.
I would definitely consider using C++ instead of just pure C. The reason I suggest it is that in C++ you are able to use templates. Using a pure virtual base class (lets call it: "vtask"), you can create templated derived classes that accept arguments and insert the arguments when the overloaded operator() is called, allowing for much, much more functionality in your tasks:
//============================================================================//
void* thread_pool::execute_thread()
{
vtask* task = NULL;
while(true)
{
//--------------------------------------------------------------------//
// Try to pick a task
m_task_lock.lock();
//--------------------------------------------------------------------//
// We need to put condition.wait() in a loop for two reasons:
// 1. There can be spurious wake-ups (due to signal/ENITR)
// 2. When mutex is released for waiting, another thread can be waken up
// from a signal/broadcast and that thread can mess up the condition.
// So when the current thread wakes up the condition may no longer be
// actually true!
while ((m_pool_state != state::STOPPED) && (m_main_tasks.empty()))
{
// Wait until there is a task in the queue
// Unlock mutex while wait, then lock it back when signaled
m_task_cond.wait(m_task_lock.base_mutex_ptr());
}
// If the thread was waked to notify process shutdown, return from here
if (m_pool_state == state::STOPPED)
{
//m_has_exited.
m_task_lock.unlock();
//----------------------------------------------------------------//
if(mad::details::allocator_list_tl::get_allocator_list_if_exists() &&
tids.find(CORETHREADSELF()) != tids.end())
mad::details::allocator_list_tl::get_allocator_list()
->Destroy(tids.find(CORETHREADSELF())->second, 1);
//----------------------------------------------------------------//
CORETHREADEXIT(NULL);
}
task = m_main_tasks.front();
m_main_tasks.pop_front();
//--------------------------------------------------------------------//
//run(task);
// Unlock
m_task_lock.unlock();
//--------------------------------------------------------------------//
// execute the task
run(task);
m_task_count -= 1;
m_join_lock.lock();
m_join_cond.signal();
m_join_lock.unlock();
//--------------------------------------------------------------------//
}
return NULL;
}
//============================================================================//
int thread_pool::add_task(vtask* task)
{
#ifndef ENABLE_THREADING
run(task);
return 0;
#endif
if(!is_alive_flag)
{
run(task);
return 0;
}
// do outside of lock because is thread-safe and needs to be updated as
// soon as possible
m_task_count += 1;
m_task_lock.lock();
// if the thread pool hasn't been initialize, initialize it
if(m_pool_state == state::NONINIT)
initialize_threadpool();
// TODO: put a limit on how many tasks can be added at most
m_main_tasks.push_back(task);
// wake up one thread that is waiting for a task to be available
m_task_cond.signal();
m_task_lock.unlock();
return 0;
}
//============================================================================//
void thread_pool::run(vtask*& task)
{
(*task)();
if(task->force_delete())
{
delete task;
task = 0;
} else {
if(task->get() && !task->is_stored_elsewhere())
save_task(task);
else if(!task->is_stored_elsewhere())
{
delete task;
task = 0;
}
}
}
In the above, each created thread runs execute_thread() until the m_pool_state is set to state::STOPPED. You lock the m_task_lock, and if the state is not STOPPED and the list is empty, you pass the m_task_lock to your condition, which puts the thread to sleep and frees the lock. You create the tasks (not shown), add the task (m_task_count is an atomic, by the way, that is why it is thread safe). During the add task, the condition is signaled to wake up a thread, from which the thread proceeds from the m_task_cond.wait(m_task_lock.base_mutex_ptr()) section of execute_thread() after m_task_lock has been acquired and locked.
NOTE: this is a highly customized implementation that wraps most of the pthread functions/objects into C++ classes so copy-and-pasting will not work whatsoever... Sorry. And w.r.t. the thread_pool::run(), unless you are worrying about return values, the (*task)() line is all you need.
I hope this helps.
EDIT: the m_join_* references is for checking whether all the tasks have been completed. The main thread sits in a similar conditioned wait that checks whether all the tasks have been completed as this is necessary for the applications I use this implementation in before proceeding.

Related

xcb_poll_for_event causes 100% usage of one cpu core

I'm learning c and messing around with xcb lib (instead of X11) on a raspberry pi4.
The problem is that when implementing the events loop with xcb_poll_for_event instead of xcb_wait_for_event, one core of four is 100% full. What am I doing wrong? And is there any benefit of using wait_for_event (blocking way) instead of xcb_poll_for_event(non blocking)?
The goal is to create a window where the user interact with keyboard/mouse/gamepad on objects, like a game. Can anyone give a hand?
The relevant code is:
int window_loop_test(xcb_connection_t *connection, Display *display){
/* window loop non blocked waiting for events */
int running = 1;
while (running) {
xcb_generic_event_t *event = xcb_poll_for_event(connection);
if (event) {
switch (event->response_type & ~0x80) {
case XCB_EXPOSE: {
// TODO
break;
}
case XCB_KEY_PRESS: {
/* Quit on 'q' key press */
/* write key pressed on console */
const xcb_key_press_event_t *press =
(xcb_key_press_event_t *)event;
XKeyEvent keyev;
keyev.display = display;
keyev.keycode = press->detail;
keyev.state = press->state;
char key[32];
XLookupString(&keyev, key, sizeof(key) - 1, NULL, NULL);
// key[len] = 0;
printf("Key pressed: %s\n", key);
printf("Mod state: %d\n", keyev.state);
if (*key == 'q')
running = 0;
break;
}
}
free(event);
}
}
return 0;
}
Polling and waiting each have their advantages and are good for different situations. Neither is "wrong" per se, but you need to use the correct one for your specific use case.
xcb_wait_for_event(connection) is a blocking call. The call will not return until an event is available, and the return value is is that event (unless an error occurs). It is good for situations where you only want the thread to respond to events, but otherwise not do anything. In that case, there is no need to spend CPU resources when no events are coming in.
xcb_poll_for_event(connection) is a non-blocking call. The call always returns immediately, but the result will be NULL if no event is available. It is good for situations where you want the thread to be able to do useful work even if no events are coming in. As you found out, it's not good if the thread only needs to respond to events, as it can consume CPU resources unnecessarily.
You mention that your goal is to create a game or something similar. Given that there are many ways to architect a game, either function can be suitable. But there are a couple of basic things to keep in mind that will determine which function you want to use. There may be other considerations as well, but this will give you an idea of what to look out for.
First of all, is your input system running on the same thread as other systems (simulation, rendering, etc)? If so, it's probably important to keep that thread available for work other than waiting for input events. In this case, xcb_poll_for_event() is almost required, otherwise your thread will be blocked until an event comes in. However, if your input system is on its own thread that doesn't block your other threads, it may be acceptable to use xcb_wait_for_event() and let that thread sleep when no events are coming in.
The second consideration is how quickly you need to respond to input events. There's often a delay in waking up a thread, so if fast response times are important you'll want to avoid letting the thread sleep in the first place. Again, xcb_poll_for_event() will be your friend in this case. If response times are not critical, xcb_wait_for_events() is an option.

Run while loop in parallel

I have a large collection (+90 000 objects) and I would like to run while loop in parallel on it, source of my function is below
val context = newSingleThreadAsyncContext()
return KtxAsync.async(context) {
val fields = regularMazeService.generateFields(colsNo, rowsNo)
val time = measureTimeMillis {
withContext(newAsyncContext(10)) {
while (availableFieldsWrappers.isNotEmpty()) {
val wrapper = getFirstShuffled(availableFieldsWrappers.lastIndex)
.let { availableFieldsWrappers[it] }
if (wrapper.neighborsIndexes.isEmpty()) {
availableFieldsWrappers.remove(wrapper)
continue
}
val nextFieldIndex = getFirstShuffled(wrapper.neighborsIndexes.lastIndex)
.let {
val fieldIndex = wrapper.neighborsIndexes[it]
wrapper.neighborsIndexes.removeAt(it)
fieldIndex
}
if (visitedFieldsIndexes.contains(nextFieldIndex)) {
wrapper.neighborsIndexes.remove(nextFieldIndex)
fields[nextFieldIndex].neighborFieldsIndexes.remove(wrapper.index)
continue
}
val nextField = fields[nextFieldIndex]
availableFieldsWrappers.add(FieldWrapper(nextField, nextFieldIndex))
visitedFieldsIndexes.add(nextFieldIndex)
wrapper.field.removeNeighborWall(nextFieldIndex)
nextField.removeNeighborWall(wrapper.index)
}
}
}
Gdx.app.log("maze-time", "$time")
On top of class
private val availableFieldsWrappers = Collections.synchronizedList(mutableListOf<FieldWrapper>())
private val visitedFieldsIndexes = Collections.synchronizedList(mutableListOf<Int>())
I test it a few times results are below:
1 thread - 21213ms
5 threads - 27894ms
10 threads - 21494ms
15 threads- 20986ms
What I'm doing wrong?
You are using Collections.synchronizedList from Java standard library, which returns a list wrapper that leverages blocking synchronized mechanism to ensure thread safety. This mechanism is not compatible with coroutines, as in it blocks the other threads from accessing the collection until the operation is finished. You should generally use non-blocking concurrent collections when accessing data from multiple coroutines or protect the shared data with a non-blocking mutex.
List.contains will be become slower and slower (O(n)) as more and more elements are added. Instead of a list, you should use a set for visitedFieldsIndexes. Just make sure to either protect it with a mutex or use a concurrent variant. Similarly, removal of values with random indices from the availableFieldsWrappers is pretty costly - instead, you can shuffle the list once and use simple iteration.
You are not reusing the coroutine contexts. In general, you can create asynchronous context once and reuse its instance instead of creating a new thread pool each time you need coroutines. You should invoke and assign the result of newAsyncContext(10) just once and reuse it throughout your application.
The code you have currently written does not leverage coroutines very well. Instead of thinking of coroutines dispatcher as a thread pool where you can launch N big tasks in parallel (i.e. your while availableFieldsWrappers.isNotEmpty loop), you should think of it as an executor of hundreds or thousands of small tasks, and adjust your code accordingly. I think you could avoid the available/visited collections altogether by rewriting your code with introduction of e.g. Kotlin flows or just multiple KtxAsync.async/KtxAsync.launch calls that handle smaller portion of the logic.
Unless some of the functions are suspending or use coroutines underneath, you're not really leveraging the multiple threads of an asynchronous context at all. withContext(newAsyncContext(10)) launches a single coroutine that handles the whole logic sequentially, leveraging only a single thread. See 4. for some ideas on how you can rewrite the code. Try collecting (or just printing) the thread hashes and names to see if you are using all of the threads well.

Slowness while calling tpinit and tpterm function in multithreaded programe

tpinit and tptern tuxedo function taking time. Its basically used in every request by client to join and leave the application. we observed heavy slowness when number of request is higher from a multi-threaded client process.
We try to increase the virtual core in machine but still face the same problem.
TPINIT * tpinitbuf;
if((tpinitbuf = (TPINIT *)tpalloc("TPINIT",(char *)NULL,TPINITNEED(16))) == (TPINIT *)NULL)
{
printf("ERROR IS:: %s\n", tpstrerror(tperrno));
return NULL;
}
tpinitbuf->flags = TPMULTICONTEXTS;
tpinit(tpinitbuf); //this function is taking time.
tpgetctxt(&ctxt, 0);
tpfree ((char *) tpinitbuf) ;
retVal=tpcall("MY_SERVICE",(char *)buf1,0,(char **) &buf2,&size,0L);
tpterm(); // this function is taking time.
Ideally tpinit, tpterm should take around 50 milliseconds, but when number of request is high its takes around 1.3 sec.
Why do you do that? Do tpinit() once for thread and do tpterm() only when the thread terminates. If you create new short-lived threads all the time then switch to using a thread pool.
Think of "joining Tuxedo application" as "connecting to a database" - and connecting/disconnecting does not seem such a great idea anymore.
There is a number of things tpinit() has to do: register itself in the shared memory (takes semaphores to prevent concurrent updates), create a reply queue and register it in the shared memory (so BBL can clean up after crashed processes), lookup service-to-queue mapping, loading plug-ins, etc. Tuxedo could be faster at all of that but if you do that too often it's your own fault, not Tuxedo's.

C Multithreading - Sqlite3 database access by 2 threads crash

Here is a description of my problem:
I have 2 threads in my program. One is the main thread and the other one that i create using pthread_create
The main thread performs various functions on an sqlite3 database. Each function opens to perform the required actions and closing it when done.
The other thread simply reads from the database after a set interval of time and uploads it onto a server. The thread also opens and closes the database to perform its operation.
The problem occurs when both threads happen to open the database. If one finishes first, it closes the database thus causing the other to crash making the application unusable.
Main requires the database for every operation.
Is there a way I can prevent this from happening? Mutex is one way but if I use mutex it will make my main thread useless. Main thread must remain functional at all times and the other thread runs in the background.
Any advice to make this work would be great.
I did not provide snippets as this problem is a bit too vast for that but if you do not understand anything about the problem please do let me know.
EDIT:
static sqlite3 *db = NULL;
Code snippet for opening database
int open_database(char* DB_dir) // argument is the db path
rc = sqlite3_open(DB_dir , &db);
if( rc )
{
//failed to open message
sqlite3_close(db);
db = NULL;
return SDK_SQL_ERR;
}
else
{
//success message
}
}
return SDK_OK;
}
And to close db
int close_database()
{
if(db!=NULL)
{
sqlite3_close(db);
db = NULL;
//success message
}
return 1;
}
EDIT: I forgot to add that the background thread performs one single write operation that updates 1 field of the table for each row it uploads onto the server
Have your threads each use their own database connection. There's no reason for the background thread to affect the main thread's connection.
Generally, I would want to be using connection pooling, so that I don't open and close database connections very frequently; connection opening is an expensive operation.
In application servers we very often have many threads, we find that a connection pool of a few tens of connections is sufficient to service requests on behalf of many hundreds of users.
Basically built into sqlite3 there are mechanisms to provide locking... BEGIN EXCLUSIVE then you can also register a sleep callback so that the other thread can do other things...
see sqlite3_busy_handler()

Making a singleton application across all users

I'm trying to create an application which only allows a single instance across all Windows users.
I'm currently doing it by opening a file to write and leaving it open. Is this method safe? Do you know of an alternative method using C?
The standard solution is to create a global mutex during application startup. The first time that the app is started, this will succeed. On subsequent attempts, it will fail, and that is your clue to halt and fail to load the second instance.
You create mutexes in Windows by calling the CreateMutex function. As the linked documentation indicates, prefixing the name of the mutex with Global\ ensures that it will be visible for all terminal server sessions, which is what you want. By contrast, the Local\ prefix would make it visible only for the user session in which it was created.
int WINAPI _tWinMain(...)
{
const TCHAR szMutexName[] = TEXT("Global\\UNIQUE_NAME_FOR_YOUR_APP");
HANDLE hMutex = CreateMutex(NULL, /* use default security attributes */
TRUE, /* create an owned mutex */
szMutexName /* name of the mutex */);
if (GetLastError() == ERROR_ALREADY_EXISTS)
{
// The mutex already exists, meaning an instance of the app is already running,
// either in this user session or another session on the same machine.
//
// Here is where you show an instructive error message to the user,
// and then bow out gracefully.
MessageBox(hInstance,
TEXT("Another instance of this application is already running."),
TEXT("Fatal Error"),
MB_OK | MB_ICONERROR);
CloseHandle(hMutex);
return 1;
}
else
{
assert(hMutex != NULL);
// Otherwise, you're the first instance, so you're good to go.
// Continue loading the application here.
}
}
Although some may argue it is optional, since the OS will handle it for you, I always advocate explicitly cleaning up after yourself and calling ReleaseMutex and CloseHandle when your application is exiting. This doesn't handle the case where you crash and don't have a chance to run your cleanup code, but like I mentioned, the OS will clean up any dangling mutexes after the owning process terminates.

Resources