GetOverlappedResult(bWait=TRUE) vs WaitForSingleObject() for overlapped I/O - c

When I open and read file in OVERLAPPED manner on Win32 api, I then have several ways to complete IO request including waiting for file handle (or event in overlapped structure) using
WaitForSingleObject
GetOverlappedResult with bWait=TRUE
Both functions seems to have same effect: thread stopped until handle or event is signaled, and that means data is placed in buffer provided to ReadFile.
So, what is the difference? Why do I need GetOverlappedResult?

i full agree with Remus Rusanu answer . also instead create own IOCP and thread pool, which will be listen on this IOCP, you can use or BindIoCompletionCallback or CreateThreadpoolIo (begin from vista) - in this case system yourself create IOCP and thread pool wich will be listen on this IOCP and when some operation completed - call your callback. this is very simplify code vs own iocp/thread pool (own iocp/thread pool really i think have sense implement only when you have very big count I/O (say socket io on server side) and need special optimization for perfomance)
however
So, what is the difference? Why do I need GetOverlappedResult
how you can see GetOverlappedResult[Ex] not only wait for result, but
return to you NumberOfBytesTransferred if operation is completed.
if operation is completed with error NTSTATUS - convert it to win32
error and set last error
if operation still pending and you want wait - it select wait on
hFile or hEvent
so GetOverlappedResult[Ex] do much more than simply call WaitForSingleObject
however not very hard implement this API yourself. for example
BOOL
WINAPI
MyGetOverlappedResult(
_In_ HANDLE hFile,
_In_ LPOVERLAPPED lpOverlapped,
_Out_ LPDWORD lpNumberOfBytesTransferred,
_In_ BOOL bWait
)
{
if ((NTSTATUS)lpOverlapped->Internal == STATUS_PENDING)
{
if (!bWait)
{
SetLastError(ERROR_IO_INCOMPLETE);
return FALSE;
}
if (lpOverlapped->hEvent)
{
hFile = lpOverlapped->hEvent;
}
if (WaitForSingleObject(hFile, INFINITE) != WAIT_OBJECT_0)
{
return FALSE;
}
}
else
{
MemoryBarrier();
}
*lpNumberOfBytesTransferred = (ULONG)lpOverlapped->InternalHigh;
NTSTATUS status = (NTSTATUS)lpOverlapped->Internal;
if (status)
{
RtlNtStatusToDosError(status);
}
return NT_SUCCESS(status);
}
so what better : use GetOverlappedResult[Ex] or implement it functional yourself ?

You could use either, but truly that's not the 'right' way of doing it. you should attach the handle to an IO completion port and then wait on the completion port. This way you have one pool of threads servicing many IO events, as you can attach multiple handles to a completion port. I recommend reading Designing Applications for High Performance.

Related

Is WSASend with IOCP ordered?

I would like to create a IOCP application to received TCP data, Once received, I will have to processing the data and write out the bytes partially. Understand that WSASend with IOCP can do the job. But I am worrying whether WSASend to the queue and does GetQueuedCompletionStatus synchronizely?. For Example:-
void processing_from_other_thread(...) {
...
DWORD state = still_doing1;
WSASend( ..,..,.., &state, .. );
DWORD state = still_doing2;
WSASend( ..,..,.., &state, .. );
DWORD state = still_doing3;
WSASend( ..,..,.., &state, .. );
DWORD state = Done;
PostCompletionQueue(....);
}
From the context above, will GetQueuedCompletionStatus getting them orderly?
GetQueuedCompletionStatus();
return still_doing1
GetQueuedCompletionStatus();
return still_doing2
GetQueuedCompletionStatus();
return still_doing3
GetQueuedCompletionStatus();
return Done
Continue
I just want to make sure the future design is correctly, I am afraid they are not orderly, For example, Return still_doing2 completed before still_doing1. The data sent might affected to client side.
Under IOCP I/O model, the application is responsible for organizing incoming and outgoing data. In other words, the process posts different I/O related messages to IOCP, but there is no guaranteed IOCP sends and/or receives data in that exact order.
This is a key part of asynchronous I/O.

what is the correct way to implement select() + threadpool?

Server:
socket()
bind()
listen()
for(;;) {
select()
if it is listenfd {
accept()
add to fd_set
} else {
add task to thread_pool work queue
threadpool_add(thread_routine)
}
}
thread_routine() {
get connection fd
read()
write()
close(connection fd)
}
This design has a problem, while select waits for data from socket_fd, another thread may close(socket_fd), this will cause select return and read(socket_fd) return EBADF. What is the right design?
It's basically okay. The mistake is in having thread_routine call close on the socket. It is never okay to destroy a resource while another thread is, or might be, using it. If this is TCP, a better option would be to call shutdown on the socket.
Perhaps this would be a better way to design your application:
One listen-thread accepts connections (i.e. only selects the listen fd)
For each accepted connection, immediately create a new thread or hand the fd to an existing idle thread
Select the client fds in their own threads for input (not in the listener thread)
This is a more intuitive design and should eliminate your problem. You might even be able use blocking IO (without select) in the client threads, depending on the protocol used.

Design of multi-threaded server in c

When trying to implement a simple echo server with concurrent support on linux.
Following approaches are used:
Use pthread functions to create a pool of thread, and maintained in a linked list. It's created on process start, and destroy on process termination.
Main thread will accept request, and use a POSIX message queue to store accepted socket file descriptor.
Threads in pool loop to read from message queue, and handle request it gets, when there is no request, it will block.
The program seems working now.
The questions are:
Is it suitable to use message queue in the middle, is it efficient enough?
What is the general approach to accomplish a thread tool that needs to handle concurrent request from multiple clients?
If it's not proper to make threads in pool loop & block to retrieve msg from message queue, then how to deliver requests to threads?
This seems unneccesarily complicated to me. The usual approach for a multithreaded server is:
Create a listen-socket in a thread process
Accept the client-connections in a thread
For each accepted client connection, create a new threads, which receives the corresponding file descriptor and does the work
The worker thread closes the client connection, when it is fully handled
I do not see much benefit in prepopulating a thread-pool here.
If you really want a threadpool:
I would just use a linked list for accepted connections and a pthread_mutex to synchronize access to it:
The listener-process enqueues client fds at the tail of the list.
The clients dequeue it at the head.
If the queue is empty, the thread can wait on a variable (pthread_cond_wait) and are notified by the listener process (pthread_cond_signal) when connections are available.
Another alternative
Depending on the complexity of handling requests, it might be an option to make the server single-threaded, i.e. handle all connections in one thread. This eliminates context-switches altogether and can thus be very performant.
One drawback is, that only one CPU-core is used. To improve that, a hybrid-model can be used:
Create one worker-thread per core.
Each thread handles simultaneously n connections.
You would however have to implement mechanisms to distribute the work fairly amongst the workers.
In addition to using pthread_mutex, you will want to use pthread_cond_t (pthread condition), this will allow you to put the threads in the thread pool to sleep while they are not actually doing work. Otherwise, you will be wasting compute cycles if they are sitting there in a loop checking for something in the work queue.
I would definitely consider using C++ instead of just pure C. The reason I suggest it is that in C++ you are able to use templates. Using a pure virtual base class (lets call it: "vtask"), you can create templated derived classes that accept arguments and insert the arguments when the overloaded operator() is called, allowing for much, much more functionality in your tasks:
//============================================================================//
void* thread_pool::execute_thread()
{
vtask* task = NULL;
while(true)
{
//--------------------------------------------------------------------//
// Try to pick a task
m_task_lock.lock();
//--------------------------------------------------------------------//
// We need to put condition.wait() in a loop for two reasons:
// 1. There can be spurious wake-ups (due to signal/ENITR)
// 2. When mutex is released for waiting, another thread can be waken up
// from a signal/broadcast and that thread can mess up the condition.
// So when the current thread wakes up the condition may no longer be
// actually true!
while ((m_pool_state != state::STOPPED) && (m_main_tasks.empty()))
{
// Wait until there is a task in the queue
// Unlock mutex while wait, then lock it back when signaled
m_task_cond.wait(m_task_lock.base_mutex_ptr());
}
// If the thread was waked to notify process shutdown, return from here
if (m_pool_state == state::STOPPED)
{
//m_has_exited.
m_task_lock.unlock();
//----------------------------------------------------------------//
if(mad::details::allocator_list_tl::get_allocator_list_if_exists() &&
tids.find(CORETHREADSELF()) != tids.end())
mad::details::allocator_list_tl::get_allocator_list()
->Destroy(tids.find(CORETHREADSELF())->second, 1);
//----------------------------------------------------------------//
CORETHREADEXIT(NULL);
}
task = m_main_tasks.front();
m_main_tasks.pop_front();
//--------------------------------------------------------------------//
//run(task);
// Unlock
m_task_lock.unlock();
//--------------------------------------------------------------------//
// execute the task
run(task);
m_task_count -= 1;
m_join_lock.lock();
m_join_cond.signal();
m_join_lock.unlock();
//--------------------------------------------------------------------//
}
return NULL;
}
//============================================================================//
int thread_pool::add_task(vtask* task)
{
#ifndef ENABLE_THREADING
run(task);
return 0;
#endif
if(!is_alive_flag)
{
run(task);
return 0;
}
// do outside of lock because is thread-safe and needs to be updated as
// soon as possible
m_task_count += 1;
m_task_lock.lock();
// if the thread pool hasn't been initialize, initialize it
if(m_pool_state == state::NONINIT)
initialize_threadpool();
// TODO: put a limit on how many tasks can be added at most
m_main_tasks.push_back(task);
// wake up one thread that is waiting for a task to be available
m_task_cond.signal();
m_task_lock.unlock();
return 0;
}
//============================================================================//
void thread_pool::run(vtask*& task)
{
(*task)();
if(task->force_delete())
{
delete task;
task = 0;
} else {
if(task->get() && !task->is_stored_elsewhere())
save_task(task);
else if(!task->is_stored_elsewhere())
{
delete task;
task = 0;
}
}
}
In the above, each created thread runs execute_thread() until the m_pool_state is set to state::STOPPED. You lock the m_task_lock, and if the state is not STOPPED and the list is empty, you pass the m_task_lock to your condition, which puts the thread to sleep and frees the lock. You create the tasks (not shown), add the task (m_task_count is an atomic, by the way, that is why it is thread safe). During the add task, the condition is signaled to wake up a thread, from which the thread proceeds from the m_task_cond.wait(m_task_lock.base_mutex_ptr()) section of execute_thread() after m_task_lock has been acquired and locked.
NOTE: this is a highly customized implementation that wraps most of the pthread functions/objects into C++ classes so copy-and-pasting will not work whatsoever... Sorry. And w.r.t. the thread_pool::run(), unless you are worrying about return values, the (*task)() line is all you need.
I hope this helps.
EDIT: the m_join_* references is for checking whether all the tasks have been completed. The main thread sits in a similar conditioned wait that checks whether all the tasks have been completed as this is necessary for the applications I use this implementation in before proceeding.

Making a singleton application across all users

I'm trying to create an application which only allows a single instance across all Windows users.
I'm currently doing it by opening a file to write and leaving it open. Is this method safe? Do you know of an alternative method using C?
The standard solution is to create a global mutex during application startup. The first time that the app is started, this will succeed. On subsequent attempts, it will fail, and that is your clue to halt and fail to load the second instance.
You create mutexes in Windows by calling the CreateMutex function. As the linked documentation indicates, prefixing the name of the mutex with Global\ ensures that it will be visible for all terminal server sessions, which is what you want. By contrast, the Local\ prefix would make it visible only for the user session in which it was created.
int WINAPI _tWinMain(...)
{
const TCHAR szMutexName[] = TEXT("Global\\UNIQUE_NAME_FOR_YOUR_APP");
HANDLE hMutex = CreateMutex(NULL, /* use default security attributes */
TRUE, /* create an owned mutex */
szMutexName /* name of the mutex */);
if (GetLastError() == ERROR_ALREADY_EXISTS)
{
// The mutex already exists, meaning an instance of the app is already running,
// either in this user session or another session on the same machine.
//
// Here is where you show an instructive error message to the user,
// and then bow out gracefully.
MessageBox(hInstance,
TEXT("Another instance of this application is already running."),
TEXT("Fatal Error"),
MB_OK | MB_ICONERROR);
CloseHandle(hMutex);
return 1;
}
else
{
assert(hMutex != NULL);
// Otherwise, you're the first instance, so you're good to go.
// Continue loading the application here.
}
}
Although some may argue it is optional, since the OS will handle it for you, I always advocate explicitly cleaning up after yourself and calling ReleaseMutex and CloseHandle when your application is exiting. This doesn't handle the case where you crash and don't have a chance to run your cleanup code, but like I mentioned, the OS will clean up any dangling mutexes after the owning process terminates.

Multi-clients on a server

For an application in C, i need to response more than one clients.
I setup the connection with a code like,
bind(...);
listen(...);
while(1){
accept(...);//accept a client
recv(...);//receive something
send(...);//send something to client
bzero(buf);//clear buffer
}
This works great when i have only one client. Other clients also can connect to the server but although they command something, server does not response back to clients who connected after the first client. How can i solve this problem?
Write a server using asynchronous, nonblocking connections.
Instead of a single set of data about a client, you need to create a struct. Each instance of the struct holds the data for each client.
The code looks vaguely like:
socket(...)
fcntl(...) to mark O_NONBLOCK
bind(...)
listen(...)
create poll entry for server socket.
while(1) {
poll(...)
if( fds[server_slot].revents & POLLIN ) {
accept(...)
fcntl(...) mark O_NONBLOCK
create poll and data array entries.
}
if( fds[i].revents & POLLIN ) {
recv(...) into data[i]
if connection i closed then clean up.
}
if( fds[i].revents & POLLOUT ) {
send(...) pending info for data[i]
}
}
If any of your calls return the error EAGAIN instead of success then don't panic. You just try again later. Be prepared for EAGAIN even if poll claims the socket is ready: it's good practice and more robust.
i need to response more than one clients.
Use Threading.
Basically you want your main thread to only do the accept part, and then handle the rest to another thread of execution (which can be either a thread or a process).
Whenever your main thread returns from "accept", give the socket descriptor to another thread, and call accept again (this can be done with fork, with pthread_create, or by maintaining a thread pool and using synchronization, for instance condition variables, to indicate that a new client has been accepted).
While the main thread will handle possible new clients incoming, the other threads will deal with the recv/send.

Resources