Transferring data to/from a callback from/to a worker thread - c

My current application is a toy web service written in C designed to replicate the behaviour of http://sprunge.us/ (takes data in via http POST, stores it to disk, returns the client a url to the data - also serves data that has been previously stored upon request).
The application is structured such that a thread pool is instantiated with worker threads (just a function pointer that takes a void* parameter) and a socket is opened to listen to incoming connections. The main loop of the program comprises a sock = accept(...) call and then a pool_add_task(worker_function_parse_http, sock) to enable requests to be handled quickly.
The parse_http worker parses the incoming request and either adds another task to the work queue for storing the data (POST) or serving previously stored data (GET).
My problem with this approach stems from the use of the http-parser library which uses a callback design to return parsed data (all http parsers that I looked at used this style). The problem I encounter is as such:
My parse_http worker:
Buffers data from the accepted socket (the function's only parameter, at this stage)
Sets up a http-parser object as per its API, complete with setting callback functions for it to call when it finishes parsing the URL or BODY or whatever. (These functions are of a fixed type signature defined by the http-parser lib, with a pointer to a buffer containing the parsed data relevant to the call, so I can't pass in my own variables and solve the problem that way. These functions also return a status code to the http parser, so I can't use the return values either. The suggested way to get data out of the parser for later use is to copy it out to a global variable during the callback - fun with multiple threads.)
Execute the parser on the buffered socket data. At this stage, the parser is expected to call its set up callbacks when it parses different sections of the buffer. The callback is supplied with parsed data relevant to each callback (e.g. BODY segment supplied to body_parsed callback function).
Well, this is where the problem shows. The parser has executed, but I don't have any access to the parsed data. Here is where I would add a new task to the queue with a worker function to store the received body data or another to handle the GET request for previously stored data. These functions would need to be supplied with both the parsed information (POST data or GET url) as well as the accepted socket so that the now delegated work can respond to the request and close the connection.
Of course, the obvious solution to the problem is simply to not use this thread-pool model with asynchronous practices, but I would like to know, for now and for later, how best to tackle this problem.
How can I get the parsed data from these callbacks back to the worker thread function. I've considered simply making my on_url_parsed and on_body_parsed do the rest of the application's job (storing and retrieving data), but of course I no longer have the client's socket to respond back to in these contexts.
If needed, I can post up the source code to the project when I get the chance.
Edit: It turns out that it is possible to access a user defined void * from within the callbacks of this particular http-parser library as the callbacks are passed a reference to the caller (the parser object) which has a user-definable data field.

A well-designed callback interface would provide for you to give the parser a void * which it would pass on to each of the callback functions when it calls them. The callback functions you provide know what type of object it points to (since you provide both the data pointer and the function pointers), so they can cast and properly dereference it. Among other advantages, this way you can provide for the callbacks to access a local variable of the function that initiates the parse, instead of having to rely on global variables.
If the parser library you are using does not have such a feature (and you don't want to switch to a better-designed one), then you can probably use thread-local storage instead of global variables. How exactly you would do that depends on your thread library and compiler, or you could roll your own by using thread identifiers as keys to thread-specific slots in some global data structure (a hash table for instance).

Related

What is difference between MQTTAsync_onSuccess and MQTTAsync_deliveryComplete callbacks?

I'm learning about MQTT (specifically the paho C library) by reading and experimenting with variations on the async pub/sub examples.
What's the difference between the MQTTAsync_deliveryComplete callback that you set with MQTTAsync_setCallbacks() vs. the MQTTAsync_onSuccess or MQTTAsync_onSuccess5 callbacks that you set in the MQTTAsync_responseOptions struct that you pass to MQTTAsync_sendMessage() ?
All seem to deal with "successful delivery" of published messages, but from reading the example code and doxygen, I can't tell how they relate to or conflict with or supplement each other. Grateful for any guidance.
Basically MQTTAsync_deliveryComplete and MQTTAsync_onSuccess do the same, they notify you via callback about the delivery of a message. Both callbacks are executed asynchronously on a separate thread to the thread on which the client application is running.
(Both callbacks are even using the same thread in the case of the current version of the Paho client, but this is a non-documented implementation detail. This thread used by MQTTAsync_deliveryComplete and MQTTAsync_onSuccess is of course not the application thread otherwise it would not be an asynchronous callback).
The difference is that MQTTAsync_deliveryComplete callback is set once via MQTTAsync_setCallbacks and then you are informed about every delivery of a message.
In contrast to this, the MQTTAsync_onSuccess informs you once for exactly the message that you send out via MQTTAsync_sendMessage().
You can even define both callbacks, which will both be called when a message is delivered.
This gives you the flexibility to choose the approach that best suits your needs.
Artificial example
Suppose you have three different functions, each sending a specific type of message (e.g. sendTemperature(), sendHumidity(), sendAirPressure()) and in each function you call MQTTAsync_sendMessage, and after each delivery you want to call a matching callback function, then you would choose MQTTAsync_onSuccess. Then you do not need to keep track of MQTTAsync_token and associate that with your callbacks.
For example, if you want to implement a logging function instead, it would be more useful to use MQTTAsync_deliveryComplete because it is called for every delivery.
And of course one can imagine that one would want to have both the specific one with some actions and the generic one for logging, so in this case both variants could be used at the same time.
Documentation
You should note that MQTTAsync_deliveryComplete explicitly states in its documentation that it takes into account the Quality of Service Set. This is not the case in the MQTTAsync_onSuccess documentation, but of course it does not mean that this is not done in the implementation. But if this is important, you should explicitly check the source code.

Can I reuse request objects?

While using multiple MPI_Isend and MPI_Irecv, should I re declare MPI_Request request, or just declare once and reuse the request object. If I have to re-declare can you answer with an example.
Yes, you can reuse MPI_Request variables. Those variables are only handles and they do not need to be initialized for passing them to MPI_Isend or MPI_Irecv (they are only marked as OUT parameters for those functions). Of course they have to be valid when passing to any function that completes them like MPI_Wait. Those functions will also set the variables to MPI_REQUEST_NULL upon completion.
You can even go further, and use persistent communication requests. If you have requests in a loop that retain the same argument list across multiple calls. You can use MPI_Send_init etc. together with MPI_Start. This can have better performance. Note that with persistent requests, the completion functions (e.g. MPI_Wait) will only mark the request as inactive rather than setting the variable to MPI_REQUEST_NULL.

All sources readiness before data flows-in aross whole Flink job/data flow

If we have several sources in our data flow/job, and some of them implement RichSourceFunction, can we assume that RichSourceFunction.open of these sources will be called and complete before any data will enter into this entire data flow (through any of the many sources) - that is even if the sources are distributed on different task managers?
Flink guarantees to call the open() method of a function instance before it passes the first record to that instance. The guarantee is scoped only to a function instance, i.e., it might happen that the open() method of a function instance was not called yet, while another function instance (of the same or another function) started processing records already.
Flink does not globally coordinate open() calls across function instances.

Dbus structure and method calls in C

I am starting to create a dbus application in C to interface with bluez. I am new to dbus and I am a little confused as how to correctly structure my application with dbus.
The first question is related to the Service, Interface, and Object path in dbus. Bluez Adapter API has the org.bluez service, a org.bluez.Adapter1 interface, and a number of methods and properties. If I wanted to call the void StopDiscovery() method, would the following be the correct call?
DBusPendingCall * pending;
// create a new method call and check for errors
msg = dbus_message_new_method_call("org.bluez",
"/", // object to call on
"org.bluez.Adapter1", // interface to call on
"StopDiscovery"); // method name
// send message and get a handle for a reply
if (!dbus_connection_send_with_reply (m_dbus_conn, msg, &pending, -1))
{
//err
}
If this is the case, when does the object path come into play?
The follow on to this is how to go about receiving information back from dbus. I've seen a few examples with a DBusPendingCall * however the function has dbus_pending_call_block() so the function blocks until the data is returned. If I wanted to do multiple calls and not block I would need to make a list of DBPendingCall pointers and check each one? Are there any callbacks?
Thanks
I did create an example showing the non-blocking call based on the dbus watch and timeout mechanism, in response to the SO question dbus watch and timeout examples. Basically you run a unix select() loop and everything is dispatched around it.
And I did not touch the multiple outstanding pending-call part. I assume one way is to check each pending-call to see whether it is completed when the watched event is received. Checking pending complete is non-blocking. If you keep a small number of outstanding pending calls it should be ok, though that is not an efficient solution if the number becomes big.
It looks like according to the API document, a better solution is to use dbus_pending_call_set_notify() to register a callback to a pending call.
So it appears that both the object path and the interface are required when talking to bluez over dbus.
// create a new method call for the adapter
msg = dbus_message_new_method_call("org.bluez",
"/org/bluez/hci0", // object to call on
"org.bluez.Adapter1", // interface to call on
"StopDiscovery"); // method name
// create a new method call for a characteristic on
// a given service
msg = dbus_message_new_method_call("org.bluez",
"/org/bluez/hci0/dev_12_34_56_78_9A_BC/service0010/char0011",
"org.bluez.GattCharacteristic1",
"StartNotify");
The select on Unix sockets for pending looks like a solid, scaleable way to go, I will consider this architecture as the application grows

When to Use a Callback function?

When do you use a callback function? I know how they work, I have seen them in use and I have used them myself many times.
An example from the C world would be libcurl which relies on callbacks for its data retrieval.
An opposing example would be OpenSSL: Where I have used it, I use out parameters:
ret = somefunc(&target_value);
if(ret != 0)
//error case
I am wondering when to use which? Is a callback only useful for async stuff? I am currently in the processes of designing my application's API and I am wondering whether to use a callback or just an out parameter. Under the hood it will use libcurl and OpenSSL as the main libraries it builds on and the parameter "returned" is an OpenSSL data type.
I don't see any benefit of a callback over just returning. Is this only useful, if I want to process the data in any way instead of just giving it back? But then I could process the returned data. Where is the difference?
In the simplest case, the two approaches are equivalent. But if the callback can be called multiple times to process data as it arrives, then the callback approach provides greater flexibility, and this flexibility is not limited to async use cases.
libcurl is a good example: it provides an API that allows specifying a callback for all newly arrived data. The alternative, as you present it, would be to just return the data. But return it — how? If the data is collected into a memory buffer, the buffer might end up very large, and the caller might have only wanted to save it to a file, like a downloader. If the data is saved to a file whose name is returned to the caller, it might incur unnecessary IO if the caller in fact only wanted to store it in memory, like a web browser showing an image. Either approach is suboptimal if the caller wanted to process data as it streams, say to calculate a checksum, and didn't need to store it at all.
The callback approach allows the caller to decide how the individual chunks of data will be processed or assembled into a larger whole.
Callbacks are useful for asynchronous notification. When you register a callback with some API, you are expecting that callback to be run when some event occurs. Along the same vein, you can use them as an intermediate step in a data processing pipeline (similar to an 'insert' if you're familiar with the audio/recording industry).
So, to summarise, these are the two main paradigms that I have encountered and/or implemented callback schemes for:
I will tell you when data arrives or some event occurs - you use it as you see fit.
I will give you the chance to modify some data before I deal with it.
If the value can be returned immediately then yes, there is no need for a callback. As you surmised, callbacks are useful in situations wherein a value cannot be returned immediately for whatever reason (perhaps it is just a long running operation which is better performed asynchronously).
My take on this: I see it as which module has to know about which one? Let's call them Data-User and IO.
Assume you have some IO, where data comes in. The IO-Module might not even know who is interested in the data. The Data-User however knows exactly which data it needs. So the IO should provide a function like subscribe_to_incoming_data(func) and the Data-User module will subscribe to the specific data the IO-Module has. The alternative would be to change code in the IO-Module to call the Data-User. But with existing libs you definitely don't want to touch existing code that someone else has provided to you.

Resources