When do you use a callback function? I know how they work, I have seen them in use and I have used them myself many times.
An example from the C world would be libcurl which relies on callbacks for its data retrieval.
An opposing example would be OpenSSL: Where I have used it, I use out parameters:
ret = somefunc(&target_value);
if(ret != 0)
//error case
I am wondering when to use which? Is a callback only useful for async stuff? I am currently in the processes of designing my application's API and I am wondering whether to use a callback or just an out parameter. Under the hood it will use libcurl and OpenSSL as the main libraries it builds on and the parameter "returned" is an OpenSSL data type.
I don't see any benefit of a callback over just returning. Is this only useful, if I want to process the data in any way instead of just giving it back? But then I could process the returned data. Where is the difference?
In the simplest case, the two approaches are equivalent. But if the callback can be called multiple times to process data as it arrives, then the callback approach provides greater flexibility, and this flexibility is not limited to async use cases.
libcurl is a good example: it provides an API that allows specifying a callback for all newly arrived data. The alternative, as you present it, would be to just return the data. But return it — how? If the data is collected into a memory buffer, the buffer might end up very large, and the caller might have only wanted to save it to a file, like a downloader. If the data is saved to a file whose name is returned to the caller, it might incur unnecessary IO if the caller in fact only wanted to store it in memory, like a web browser showing an image. Either approach is suboptimal if the caller wanted to process data as it streams, say to calculate a checksum, and didn't need to store it at all.
The callback approach allows the caller to decide how the individual chunks of data will be processed or assembled into a larger whole.
Callbacks are useful for asynchronous notification. When you register a callback with some API, you are expecting that callback to be run when some event occurs. Along the same vein, you can use them as an intermediate step in a data processing pipeline (similar to an 'insert' if you're familiar with the audio/recording industry).
So, to summarise, these are the two main paradigms that I have encountered and/or implemented callback schemes for:
I will tell you when data arrives or some event occurs - you use it as you see fit.
I will give you the chance to modify some data before I deal with it.
If the value can be returned immediately then yes, there is no need for a callback. As you surmised, callbacks are useful in situations wherein a value cannot be returned immediately for whatever reason (perhaps it is just a long running operation which is better performed asynchronously).
My take on this: I see it as which module has to know about which one? Let's call them Data-User and IO.
Assume you have some IO, where data comes in. The IO-Module might not even know who is interested in the data. The Data-User however knows exactly which data it needs. So the IO should provide a function like subscribe_to_incoming_data(func) and the Data-User module will subscribe to the specific data the IO-Module has. The alternative would be to change code in the IO-Module to call the Data-User. But with existing libs you definitely don't want to touch existing code that someone else has provided to you.
Related
I have an use case where I need to apply multiple functions to every incoming message, each producing 0 or more results.
Having a loop won't scale for me, and ideally I would like to be able to emit results as soon as they are ready instead of waiting for the all the functions to be applied.
I thought about using AsyncIO for this, maintaining a ThreadPool but if I am not mistaken I can only emit one record using this API, which is not a deal-breaker but I'd like to know if there are other options, like using a ThreadPool but in a Map/Process function so then I can send the results as they are ready.
Would this be an anti-pattern, or cause any problems in regards to checkpointing, at-least-once guarantees?
Depending on the number of different functions involved, one solution would be to fan each incoming message out to n operators, each applying one of the functions.
I fear you'll get into trouble if you try this with a multi-threaded map/process function.
How about this instead:
You could have something like a RichCoFlatMap (or KeyedCoProcessFunction, or BroadcastProcessFunction) that is aware of all of the currently active functions, and for each incoming event, emits n copies of it, each being enriched with info about a specific function to be performed. Following that can be an async i/o operator that has a ThreadPool, and it takes care of executing the functions and emitting results if and when they become available.
I have a Flink job with the classic shape of datasource-operator1-operatorN-sink.
From what I can observe, the open() method of operator1 is invoked before the open() method of the datasource.
In the open() method of operator1 I need to handle some business logic, that it is dependent of stuff which gets resolved at datasource.open()
1- Is there any way that I can restrain that the operator1.open() is not invoked until datasource.open() is?
2- Is there any way to communicate/signal from the datasource.open() method, to the operator1.open() method?
Trying to establish some sort of out-of-band communication between operators often gets folks into trouble. At best it can screw up performance, and at worst it can lead to deadlocks.
What you might try instead is to rely on the signaling pathway that already exists between the data source and the async function -- in other words, emit a specially encoded event from the data source that tells the async function it can start now, and have the async function wait for that special record before doing other processing.
I'm learning about MQTT (specifically the paho C library) by reading and experimenting with variations on the async pub/sub examples.
What's the difference between the MQTTAsync_deliveryComplete callback that you set with MQTTAsync_setCallbacks() vs. the MQTTAsync_onSuccess or MQTTAsync_onSuccess5 callbacks that you set in the MQTTAsync_responseOptions struct that you pass to MQTTAsync_sendMessage() ?
All seem to deal with "successful delivery" of published messages, but from reading the example code and doxygen, I can't tell how they relate to or conflict with or supplement each other. Grateful for any guidance.
Basically MQTTAsync_deliveryComplete and MQTTAsync_onSuccess do the same, they notify you via callback about the delivery of a message. Both callbacks are executed asynchronously on a separate thread to the thread on which the client application is running.
(Both callbacks are even using the same thread in the case of the current version of the Paho client, but this is a non-documented implementation detail. This thread used by MQTTAsync_deliveryComplete and MQTTAsync_onSuccess is of course not the application thread otherwise it would not be an asynchronous callback).
The difference is that MQTTAsync_deliveryComplete callback is set once via MQTTAsync_setCallbacks and then you are informed about every delivery of a message.
In contrast to this, the MQTTAsync_onSuccess informs you once for exactly the message that you send out via MQTTAsync_sendMessage().
You can even define both callbacks, which will both be called when a message is delivered.
This gives you the flexibility to choose the approach that best suits your needs.
Artificial example
Suppose you have three different functions, each sending a specific type of message (e.g. sendTemperature(), sendHumidity(), sendAirPressure()) and in each function you call MQTTAsync_sendMessage, and after each delivery you want to call a matching callback function, then you would choose MQTTAsync_onSuccess. Then you do not need to keep track of MQTTAsync_token and associate that with your callbacks.
For example, if you want to implement a logging function instead, it would be more useful to use MQTTAsync_deliveryComplete because it is called for every delivery.
And of course one can imagine that one would want to have both the specific one with some actions and the generic one for logging, so in this case both variants could be used at the same time.
Documentation
You should note that MQTTAsync_deliveryComplete explicitly states in its documentation that it takes into account the Quality of Service Set. This is not the case in the MQTTAsync_onSuccess documentation, but of course it does not mean that this is not done in the implementation. But if this is important, you should explicitly check the source code.
My current application is a toy web service written in C designed to replicate the behaviour of http://sprunge.us/ (takes data in via http POST, stores it to disk, returns the client a url to the data - also serves data that has been previously stored upon request).
The application is structured such that a thread pool is instantiated with worker threads (just a function pointer that takes a void* parameter) and a socket is opened to listen to incoming connections. The main loop of the program comprises a sock = accept(...) call and then a pool_add_task(worker_function_parse_http, sock) to enable requests to be handled quickly.
The parse_http worker parses the incoming request and either adds another task to the work queue for storing the data (POST) or serving previously stored data (GET).
My problem with this approach stems from the use of the http-parser library which uses a callback design to return parsed data (all http parsers that I looked at used this style). The problem I encounter is as such:
My parse_http worker:
Buffers data from the accepted socket (the function's only parameter, at this stage)
Sets up a http-parser object as per its API, complete with setting callback functions for it to call when it finishes parsing the URL or BODY or whatever. (These functions are of a fixed type signature defined by the http-parser lib, with a pointer to a buffer containing the parsed data relevant to the call, so I can't pass in my own variables and solve the problem that way. These functions also return a status code to the http parser, so I can't use the return values either. The suggested way to get data out of the parser for later use is to copy it out to a global variable during the callback - fun with multiple threads.)
Execute the parser on the buffered socket data. At this stage, the parser is expected to call its set up callbacks when it parses different sections of the buffer. The callback is supplied with parsed data relevant to each callback (e.g. BODY segment supplied to body_parsed callback function).
Well, this is where the problem shows. The parser has executed, but I don't have any access to the parsed data. Here is where I would add a new task to the queue with a worker function to store the received body data or another to handle the GET request for previously stored data. These functions would need to be supplied with both the parsed information (POST data or GET url) as well as the accepted socket so that the now delegated work can respond to the request and close the connection.
Of course, the obvious solution to the problem is simply to not use this thread-pool model with asynchronous practices, but I would like to know, for now and for later, how best to tackle this problem.
How can I get the parsed data from these callbacks back to the worker thread function. I've considered simply making my on_url_parsed and on_body_parsed do the rest of the application's job (storing and retrieving data), but of course I no longer have the client's socket to respond back to in these contexts.
If needed, I can post up the source code to the project when I get the chance.
Edit: It turns out that it is possible to access a user defined void * from within the callbacks of this particular http-parser library as the callbacks are passed a reference to the caller (the parser object) which has a user-definable data field.
A well-designed callback interface would provide for you to give the parser a void * which it would pass on to each of the callback functions when it calls them. The callback functions you provide know what type of object it points to (since you provide both the data pointer and the function pointers), so they can cast and properly dereference it. Among other advantages, this way you can provide for the callbacks to access a local variable of the function that initiates the parse, instead of having to rely on global variables.
If the parser library you are using does not have such a feature (and you don't want to switch to a better-designed one), then you can probably use thread-local storage instead of global variables. How exactly you would do that depends on your thread library and compiler, or you could roll your own by using thread identifiers as keys to thread-specific slots in some global data structure (a hash table for instance).
I am using the libcurl multi interface and I need to know how much data is being sent for each request. I would rather not use the CURLOPT_XFERINFOFUNCTION because it gets called a lot and I only need to know the dltotal while I am in the CURLOPT_WRITEFUNCTION callback. I want to clean up the existing easy handle and malloc'd data while I am still in the write callback once all the data has been received. Is there a function I can call that will return the total amount of data that is being sent for a particular easy handle?
I tried using curl_easy_getinfo() with CURLINFO_SIZE_DOWNLOAD and it always returned 0. I also tried CURLINFO_CONTENT_LENGTH_DOWNLOAD which also always returned 0. I was calling this from within the CURLOPT_WRITEFUNCTION callback.
The reason you get zeroes back from those calls is probably because the size simply isn't known before-hand.
But let me also alert you that "I want to clean up the existing easy handle and malloc'd data while I am still in the write callback" sounds like a disaster waiting to happen. You really should not cleanup the handle from within the callback.