I'm working on a project that involves hooking WSARecv. I know how to hook this function, I mean its just the same as hooking another function. Anyway the hard part is when WSARecv is used to perform overlapped operations. The idea is that when an application receives data to intercept that and be possible to modify this, I'm using pipes for this. The native DLL tunnels all data to a managed 'server'. This processes the input etc and returns it back to the native DLL. This works great for WSASend, send and recv. However the hard part is when an application uses overlapped sockets.
So I need the received data first before I can process it, this is the hard part. How would I do something like this? I thought of this, but they both seem like a mess:
When WSARecv is called using the WSAOverlapped:
Create a new thread, use WaitForSingleObject and pass the hEvent of the WSAOverlapped structure. When the event is signaled process the data to the managed server and pass the data to the program.
When WSARecv is called using the completion routine:
Create a new thread, modify the call to the original function with lpOperationCompleted to a new function. Use SleepEx to put the thread in an alertable state. When the OperationCompleted is called process the data and pass data back to the program.
I could post my code but I didn't write because it seems like a bad solution.. So there is not really a point for that.
I cannot think of a better solution and this seems horrible because when an application calls WSARecv a lot (for example a large server using overlapped sockets to handle lots of clients) it creates a new thread for every call and that just seems like a bad idea.
So how can I do such thing?
There's no need to create a thread for each overlapped IO call.
When overlapped operations are used, they either have an associated event (which you can safely ignore), a completion routine, or are associated with an I/O Completion port.
To handle the first two cases you should hook both WSARecv() and WSAGetOverlappedResult().
If you need to handle the last, you'll also need to hook GetQueuedCompletionStatus()
Now, when you get a call to WSARecv(), for the event case, you do nothing special there (except possibly save some information in relation to the lpOverlapped, eg. the buffer), and process the data in WSAGetOverlappedResult() (which the application must call to get the success/error and bytes transferred.)
If a completion routine is present, save the lpOverlapped and lpCompletionRoutine, and pass your own completion routine to the real WSARecv().
Your routine should process the data and call the original completion routine.
To handle the I/O completion port case, have WSARecv() save lpOverlapped and buffers etc., in GetQueuedCompletionStatus(), call the original, and if the returned overlapped structure matches, handle the data.
You should also note that overlapped operations may complete immediately, in which case the event isn't signaled, the completion routine isn't called, and (IIRC) no completion is queued on the IOCP.
Related
I use libcurl easy interface and I create lots of threads in my c++ app to handle these http requests. I would like to convert the code to use libcurl multi instead. Conceptually, the idea is clear: instead of calling blocking curl_easy_perform on each curl easy handle from multiple threads I'll call a blocking curl_multi_perform from a single thread and this call internally will handle all attached curl easy handles.
Things that aren't clear to me:
how do I cancel any of the outstanding http requests that are being handled by the blocking curl_multi_perform call (from another thread). Similarly, would the same work with easy interface, can I end/about an http request from another thread while there is another thread does curl_easy_perform on that handle.
Is it ok to add new easy handles to a multi handle while there is another thread calls curl_multi_perform on the multi handle?
If I use curl_multi_remove_handle to abort one of outgoing http requests while it was loading data (let's say it was doing 1GB file download) then I can reuse the same handle right after that. Does curl close that tcp connection that was aborted in the middle? Otherwise, I don't see how that connection could possibly be reused without completely downloading entire 1GB body.
Is there a simple example that used to do multiple easy requests from different threads and same example converted to multi interface?
(This is really several questions disguised as one, which is not a good fit for stackoverflow.)
curl_multi_perform() doesn't block. It does as much as it can do for now, then it returns and expects the program to call it again when it's time or when there's activity on one of its sockets.
Ideally you can mark which transfers to stop in the other threads and as soon as curl_multi_perform() returns you can remove said easy handles from the multi handle and they're no longer in the game. Alternatively, you can use the individual transfer's callbacks (write/read/progress) to return error when you want that transfer to end.
It is not OK to use the same libcurl handle in more than one thread at any given moment. If you really need to use the same handle from more than one thread, then you need to do careful mutexing. See the libcurl treading man page. It is usually better to put things into qeueus from the other threads and let the single libcurl-using thread read handles or actions from that queue when it can, which then assures single thread access to the handles.
If you abort a transfer by removing the handle with curl_multi_remove_handle(), that transfer is aborted. Stopped. You can indeed reuse that handle immediately and if you just put it back in, it will be treated as a brand new transfer and unless you change any options in the easy handle it will simply start off from the beginning again with the same URL. Prematurely aborted transfers will of course be treated correctly, which might include closing the TCP connection if necessary.
From the MSDN Documentation:
The transport providers allow an application to invoke send and receive operations from within the context of the socket I/O completion routine, and guarantee that, for a given socket, I/O completion routines will not be nested. This permits time-sensitive data transmissions to occur entirely within a preemptive context.
In our system we do have one thread calling WSARecvFrom() for multiple sockets. There is one CompletionRoutine for that thread handling all call backs from WSARecvFrom() opverlapped I/O.
Our tests showed that this Completion Routine is called like triggered from an Interrupt. Called for a socket while still processing the completeion Routine from an other socket.
How do we can prevent that this completion Routine gets not called while it is still processing Input from an other socket?
What Serialisation of data processing can we use ?
Note there are hundrets of sockets receiving and sending realtime data. Synchronisation with waiting for multiple objects is not applicable as there is a maximum of 64 defined by the Win32 API.
We can not use a Semaphore because when newly called the old ongoing processing is interreupted so a Semaphore would no be realeased and new processing blocks for ever.
Critical Sections or Mutex is not an Option because the Completion Routine Call back is made from within the same thread so CS or mutex would accept anyway and would not wait till the old processing is finished.
Does anyone have an Idea or even better approach to serialze (synchronize) data processing ?
If you read the WSARecvFrom() documentation again more carefully, it also says:
The completion routine follows the same rules as stipulated for Windows file I/O completion routines. The completion routine will not be invoked until the thread is in an alertable wait state such as can occur when the function WSAWaitForMultipleEvents with the fAlertable parameter set to TRUE is invoked.
The Alertable I/O documentation then states:
When the thread enters an alertable state, the following events occur:
The kernel checks the thread's APC queue. If the queue contains callback function pointers, the kernel removes the pointer from the queue and sends it to the thread.
The thread executes the callback function.
Steps 1 and 2 are repeated for each pointer remaining in the queue.
When the queue is empty, the thread returns from the function that placed it in an alertable state.
So it should be practically impossible for a given thread to overlap multiple pending completion routines on top of each other, because the thread receives and processes the routines in a serialized manner. The only way I could see that being different is if a completion routine is doing something to put the thread into a second alertable state while a previous alertable state is still in effect. I'm not sure what Windows does in that situation, but you should avoid doing it anyway.
Note there are hundrets of sockets receiving and sending realtime data. Synchronisation with waiting for multiple objects is not applicable as there is a maximum of 64 defined by the Win32 API
The WaitForMultipleObjects() documentation tells you how to work around that limitation:
To wait on more than MAXIMUM_WAIT_OBJECTS handles, use one of the following methods:
• Create a thread to wait on MAXIMUM_WAIT_OBJECTS handles, then wait on that thread plus the other handles. Use this technique to break the handles into groups of MAXIMUM_WAIT_OBJECTS.
• Call RegisterWaitForSingleObject to wait on each handle. A wait thread from the thread pool waits on MAXIMUM_WAIT_OBJECTS registered objects and assigns a worker thread after the object is signaled or the time-out interval expires.
I wouldn't wait on the sockets anyway, that is not very efficient. Using completion routines is fine as long as they are doing safe things.
Otherwise, I would suggest you stop using completion routines and switch to using an I/O Completion Port for the socket I/O instead. Then you are in more control of when the completion results are reported to you, because you have to call GetQueuedCompletionStatus() yourself to get the results of each I/O operation. You can have multiple sockets associated with a single IOCP, and then have a small pool of threads (typically one thread per CPU core works best) all calling GetQueuedCompletionStatus() on that IOCP. This way, you can process multiple I/O results in parallel, as they will be in different thread contexts and cannot overlap each other in the same thread. This does mean, however, that you can perform an I/O operation in one thread and the result may show up in a different thread. Just make sure your completion processing is thread-safe.
First of all let me thanks for all the helpful hints and comments to my question.
We did stop now using completion routines. We changed the application to use completion ports.
The biggest problem we had with completion routines is that every time the thread goes into an alertable state the completion routines can (and will) be called again from the OS. As seen in the Debugger also calling WSASendTo() from inside the completion routine puts the thread into an alertable state. So the completion routine is executed again before the previous execution of the completion routine comes to its end.
This makes it nearly impossible to synchronize data processing from multiple different sockets.
The approach using Completion Ports seems to be the perfect one. You then have control about what are doing when you are released from GetQueuedCompletionStatus() for processing a data buffer. You have to and you can do the synchronization of data processing by yourself in a linear fashion without being interrupted and newly executed while trying to process the data.
I am having some difficulty understanding callbacks and program flow, synchronization issues.
Lets say I have a global variable g_peers. And I register a callback with a system app which will notify me with peer events like - joins/leave/change. Now in the callback, I am modifying g_peers based on the event and associated information. In other parts of the code (i.e the regular code flow) I have functions which read from g_peers.
Now will this result in synchronization issues? Lets say I am in the middle of reading from g_peers when a peer leaves and callback is invoked which modifies g_peers.
How does callback work? Is the normal flow interrupted till the callback finishes?
Global variables in a multithreaded enviornment always need to be synchronized for concurrent access through multiple threads.
If your environment is multithreaded then the callback will be called in a separate thread and hence must be synchronized.
If your environment is single threaded then no synchronization is needed.
What is a Callback?
In simple terms, a Callback function is one that is not called explicitly by the programmer. Instead, there is some mechanism that continually waits for events to occur, and it will call selected functions in response to particular events.
This mechanism is typically used when a operation(function) can take long time for execution and the caller of the function does not want to wait till the operation is complete, but does wish to be intimated of the outcome of the operation. Typically, Callback functions help implement such an asynchronous mechanism, wherein the caller registers to get inimated about the result of the time consuming processing and continuous other operations while at a later point of time, the caller gets informed of the result.
The CreateIoCompletionPort function allows the creation of a new I/O completion port and the registration of file handles to an existing I/O completion port.
Then, I can use any function, like a recv on a socket or a ReadFile on a file with a OVERLAPPED structure to start an asynchronous operation.
I have to check whether the function call returned synchronously although it was called with an OVERLAPPED structure and in this case handle it directly. In the other case, when ERROR_IO_PENDING is returned, I can use the GetQueuedCompletionStatus function to be notified when the operation completes.
The question which arise are:
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
And finally, is it possible for example to recv asynchronously but to send synchronously? For example when a simple echo service is implemented: Can I wait with an asynchronous recv for new data but send the response in a synchronous way so that code complexity is reduced? In my case, I wouldn't recv a second time anyways before the first request was processed.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed. Will the ReadFile be cancelled with an error message and I have to restart the read process as soon as the write is complete? Or do I have to cancel the ReadFile manually before writing? This question arises in combination with a communication device; so, the write and read should not do problems if happening concurrently.
How can I remove a handle from the I/O completion port?
In my experience you can't disassociate a handle from a completion port. However, you may disable completion port notification by setting the low-order bit of your OVERLAPPED structure's hEvent field: See the documentation for GetQueuedCompletionStatus.
For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
It is not necessary to explicitly disassociate a handle from an I/O completion port; closing the handle is sufficient. You may associate multiple handles with the same completion key; the best way to figure out which request is associated with the I/O completion is by using the OVERLAPPED structure. In fact, you may even extend OVERLAPPED to store additional data.
Also, is there a way to make the calls ALWAYS go over the I/O completion port and don't return synchronously?
That is the default behavior, even when ReadFile/WriteFile returns TRUE. You must explicitly call SetFileCompletionNotificationModes to tell Windows to not enqueue a completion packet when TRUE and ERROR_SUCCESS are returned.
is it possible for example to recv asynchronously but to send synchronously?
Not by using recv and send; you need to use functions that accept OVERLAPPED structures, such as WSARecv, WSASend, or alternatively ReadFile and WriteFile. It might be more handy to use the latter if your code is meant to work multiple types of I/O handles, such as both sockets and named pipes. Those functions provide a synchronous mode, so if you use those them you can mix asynchronous and synchronous calls.
What happens if an asynchronous ReadFile has been requested, but before it completes, a WriteFile to the same file should be processed?
There is no implicit cancellation. As long as you're using separate OVERLAPPED structures for each read/write to a full-duplex device, I see no reason why you can't do concurrent I/O operations.
As I’ve already pointed out there, the commonly held belief that it is impossible to remove handles from completion ports is wrong, probably caused by the abscence of any hint whatsoever on how to do this from nearly all documentation I could find. Actually, it’s pretty easy:
Call NtSetInformationFile with the FileReplaceCompletionInformationenumerator value for FileInformationClass and a pointer to a FILE_COMPLETION_INFORMATION structure for the FileInformation parameter. In this structure, set the Port member to NULL (or nullptr, in C++) to disassociate the file from the port it’s currently attached to (I guess if it isn’t attached to any port, nothing would happen),
or set Port to a valid HANDLE to another completion port to associate the file with that one instead.
First some important corrections.
In case the overlapped I/O operation completes immediately (ReadFile or similar I/O function returns success) - the I/O completion is already scheduled to the IOCP.
Also, according to your questions I think you confuse between the file/socket handles, and the specific I/O operations issued on them.
Now, regarding your questions:
AFAIK there is no conventional way to remove a file/socket handle from the IOCP (usually you just don't have to do this). You talk about removing closed handles from the IOCP, which is absolutely incorrect. You can't remove a closed handle, because it does not reference a valid kernel object anymore!
A more correct question should be how the file/socket should be properly closed. The answer is: just close your handle. All the outstanding I/O operations (issued on this handle) will return soon with an error code (abortion). Then, in your completion routine (the one that calls GetQueuedCompletionStatus in a loop) should perform the per-I/O needed cleanup.
As I've already said, all the I/O completion arrives at IOCP in both synchronous and asynchronous cases. The only situation where it does not arrive at IOCP is when an I/O completes synchronously with an error. Anyway, if you want a unified processing - in such a case you may post an artificial completion data to IOCP (use PostQueuedCompletionStatus).
You should use WSASend and WSARecv (not recv and send) for overlapped I/O. Nevertheless, even of the socket was opened with flag WSA_FLAG_OVERLAPPED - you are allowed to call the I/O functions without specifying the OVERLAPPED structure. In such a case those functions work synchronously.
So that you may decide on synchronous/asynchronous modes for every function call.
There is no problem to mix overlapped read/write requests. The only delicate point here is what happens if you try to read the data from the file position where you're currently writing to. The result may depend on subtle things, such as order of completion of I/Os by the hardware, some PC timing parameters and etc. Such a situation should be avoided.
How can I remove a handle from the I/O completion port? For example, when I add sockets to the IOCP, how can I remove closed ones? Should I just re-register another socket with the same completion key?
You've got it the wrong way around. You set the I/O completion port to be used by a file object - when the file object is deleted, you have nothing to worry about. The reason you're getting confused is because of the way Win32 exposes the underlying native API functionality (CreateIoCompletionPort does two very different things in one function).
Also, is there a way to make the calls
ALWAYS go over the I/O completion port
and don't return synchronously?
This is how it's always been. Only starting with Windows Vista can you customize how the completion notifications are handled.
What happens if an asynchronous
ReadFile has been requested, but
before it completes, a WriteFile to
the same file should be processed.
Will the ReadFile be cancelled with an
error message and I have to restart
the read process as soon as the write
is complete?
I/O operations in Windows are asynchronous inherently, and requests are always queued. You may not think this is so because you have to specify FILE_FLAG_OVERLAPPED in CreateFile to turn on asynchronous I/O. However, at the native layer, synchronous I/O is really an add-on, convenience thing where the kernel keeps track of the file position for you and waits for the I/O to complete before returning.
I have a worker thread that is listening to a TCP socket for incoming traffic, and buffering the received data for the main thread to access (let's call this socket A). However, the worker thread also has to do some regular operations (say, once per second), even if there is no data coming in. Therefore, I use select() with a timeout, so that I don't need to keep polling. (Note that calling receive() on a non-blocking socket and then sleeping for a second is not good: the incoming data should be immediately available for the main thread, even though the main thread might not always be able to process it right away, hence the need for buffering.)
Now, I also need to be able to signal the worker thread to do some other stuff immediately; from the main thread, I need to make the worker thread's select() return right away. For now, I have solved this as follows (approach basically adopted from here and here):
At program startup, the worker thread creates for this purpose an additional socket of the datagram (UDP) type, and binds it to some random port (let's call this socket B). Likewise, the main thread creates a datagram socket for sending. In its call to select(), the worker thread now lists both A and B in the fd_set. When the main thread needs to signal, it sendto()'s a couple of bytes to the corresponding port on localhost. Back in the worker thread, if B remains in the fd_set after select() returns, then recvfrom() is called and the bytes received are simply ignored.
This seems to work very well, but I can't say I like the solution, mainly as it requires binding an extra port for B, and also because it adds several additional socket API calls which may fail I guess – and I don't really feel like figuring out the appropriate action for each of the cases.
I think ideally, I would like to call some function which takes A as input, and does nothing except makes select() return right away. However, I don't know such a function. (I guess I could for example shutdown() the socket, but the side effects are not really acceptable :)
If this is not possible, the second best option would be creating a B which is much dummier than a real UDP socket, and doesn't really require allocating any limited resources (beyond a reasonable amount of memory). I guess Unix domain sockets would do exactly this, but: the solution should not be much less cross-platform than what I currently have, though some moderate amount of #ifdef stuff is fine. (I am targeting mainly for Windows and Linux – and writing C++ by the way.)
Please don't suggest refactoring to get rid of the two separate threads. This design is necessary because the main thread may be blocked for extended periods (e.g., doing some intensive computation – and I can't start periodically calling receive() from the innermost loop of calculation), and in the meanwhile, someone needs to buffer the incoming data (and due to reasons beyond what I can control, it cannot be the sender).
Now that I was writing this, I realized that someone is definitely going to reply simply "Boost.Asio", so I just had my first look at it... Couldn't find an obvious solution, though. Do note that I also cannot (easily) affect how socket A is created, but I should be able to let other objects wrap it, if necessary.
You are almost there. Use a "self-pipe" trick. Open a pipe, add it to your select() read and write fd_set, write to it from main thread to unblock a worker thread. It is portable across POSIX systems.
I have seen a variant of similar technique for Windows in one system (in fact used together with the method above, separated by #ifdef WIN32). Unblocking can be achieved by adding a dummy (unbound) datagram socket to fd_set and then closing it. The downside is that, of course, you have to re-open it every time.
However, in the aforementioned system, both of these methods are used rather sparingly, and for unexpected events (e.g., signals, termination requests). Preferred method is still a variable timeout to select(), depending on how soon something is scheduled for a worker thread.
Using a pipe rather than socket is a bit cleaner, as there is no possibility for another process to get hold of it and mess things up.
Using a UDP socket definitely creates the potential for stray packets to come in and interfere.
An anonymous pipe will never be available to any other process (unless you give it to it).
You could also use signals, but in a multithreaded program you'll want to make sure that all threads except for the one you want have that signal masked.
On unix it will be straightforward with using a pipe. If you are on windows and want to keep using the select statement to keep your code compatible with unix, the trick to create an unbound UDP socket and close it, works well and easy. But you have to make it multi-threadsafe.
The only way I found to make this multi-threadsafe is to close and recreate the socket in the same thread as the select statement is running. Of course this is difficult if the thread is blocking on the select. And then comes in the windows call QueueUserAPC. When windows is blocking in the select statement, the thread can handle Asynchronous Procedure Calls. You can schedule this from a different thread using QueueUserAPC. Windows interrupts the select, executes your function in the same thread, and continues with the select statement. You can now in your APC method close the socket and recreate it. Guaranteed thread safe and you will never loose a signal.
To be simple:
a global var saves the socket handle, then close the global socket, the select() will return immediately: closesocket(g_socket);