Can I reuse request objects? - c

While using multiple MPI_Isend and MPI_Irecv, should I re declare MPI_Request request, or just declare once and reuse the request object. If I have to re-declare can you answer with an example.

Yes, you can reuse MPI_Request variables. Those variables are only handles and they do not need to be initialized for passing them to MPI_Isend or MPI_Irecv (they are only marked as OUT parameters for those functions). Of course they have to be valid when passing to any function that completes them like MPI_Wait. Those functions will also set the variables to MPI_REQUEST_NULL upon completion.
You can even go further, and use persistent communication requests. If you have requests in a loop that retain the same argument list across multiple calls. You can use MPI_Send_init etc. together with MPI_Start. This can have better performance. Note that with persistent requests, the completion functions (e.g. MPI_Wait) will only mark the request as inactive rather than setting the variable to MPI_REQUEST_NULL.

Related

Multithreading inside Flink's Map/Process function

I have an use case where I need to apply multiple functions to every incoming message, each producing 0 or more results.
Having a loop won't scale for me, and ideally I would like to be able to emit results as soon as they are ready instead of waiting for the all the functions to be applied.
I thought about using AsyncIO for this, maintaining a ThreadPool but if I am not mistaken I can only emit one record using this API, which is not a deal-breaker but I'd like to know if there are other options, like using a ThreadPool but in a Map/Process function so then I can send the results as they are ready.
Would this be an anti-pattern, or cause any problems in regards to checkpointing, at-least-once guarantees?
Depending on the number of different functions involved, one solution would be to fan each incoming message out to n operators, each applying one of the functions.
I fear you'll get into trouble if you try this with a multi-threaded map/process function.
How about this instead:
You could have something like a RichCoFlatMap (or KeyedCoProcessFunction, or BroadcastProcessFunction) that is aware of all of the currently active functions, and for each incoming event, emits n copies of it, each being enriched with info about a specific function to be performed. Following that can be an async i/o operator that has a ThreadPool, and it takes care of executing the functions and emitting results if and when they become available.

How to send multiple diagRequest messages in Vector CAPL?

I'm currently writing some CAPL code that is executed when clicking a button. It shall send multiple Diagnostic Requests. But CANoe is always telling me, that it can only send one request at a time. So I need to delay the requests. The diagSetRequestInterval function did not work. And since it is NOT a testcase, the testWaitForDiagResponse doesn't work either.
You have to wait until the request has been handled (either by a response from the target or by a timeout).
Since you are not in a test node you have to give back the control to the system, i.e. your function which did diagSendRequest shall end and you wait for some events on the bus to occur before you continue (otherwise the simulation would stall).
Once the request has been handled on diagRequest ... is called. Inside this event procedure, you could send the next request and so on.
Example:
Instead of:
myFunction()
{
diagRequest ECU.ProgrammingSession req1;
diagRequest ECU.SecuritySeed req2:
diagSendRequest(req1);
diagSendRequest(req2);
}
You would do something like this:
myFunction()
{
diagRequest ECU.ProgrammingSession req1;
diagSendRequest(req1);
}
on diagResponse ECU.ProgrammingSession
{
diagRequest ECU.SecuritySeed req2:
diagSendRequest(req2);
}
Timeout handling is a different topic, and left as an exercise :-)
You practically want to implement multiple TP connection simultaneously in CANoe. I presume you have only one Diagnostic Description in the Diagnostic/ISO TP configuration, which lets you to use only 1 TP connection at a time.
You can implement multiple diag layers in Diagnostic ISO/TP on the same Communication channel, as much as you want, but with different namings.
In simulation node, you will only have to declare the request you want with a different namespace, corresponding to one of the diag layer name you earlier created.
This way you can virtualize the multiple TP connection in UDS for the CANoe environment.
OR, you do not use diagnostic layer support by CANoe, and you construct the whole message with UDS payload on your data link layer (CAN, FR).
Depends what kind of Data link layer (CAN,FR) and how many comm channels with diag layer you have set.
In Flexray, for example ,you can send multiple diag requests in the same frcycle, if your frschedule provides multiple frslots in dynamic segment which the Diaglayer (or you) can use.

Transferring data to/from a callback from/to a worker thread

My current application is a toy web service written in C designed to replicate the behaviour of http://sprunge.us/ (takes data in via http POST, stores it to disk, returns the client a url to the data - also serves data that has been previously stored upon request).
The application is structured such that a thread pool is instantiated with worker threads (just a function pointer that takes a void* parameter) and a socket is opened to listen to incoming connections. The main loop of the program comprises a sock = accept(...) call and then a pool_add_task(worker_function_parse_http, sock) to enable requests to be handled quickly.
The parse_http worker parses the incoming request and either adds another task to the work queue for storing the data (POST) or serving previously stored data (GET).
My problem with this approach stems from the use of the http-parser library which uses a callback design to return parsed data (all http parsers that I looked at used this style). The problem I encounter is as such:
My parse_http worker:
Buffers data from the accepted socket (the function's only parameter, at this stage)
Sets up a http-parser object as per its API, complete with setting callback functions for it to call when it finishes parsing the URL or BODY or whatever. (These functions are of a fixed type signature defined by the http-parser lib, with a pointer to a buffer containing the parsed data relevant to the call, so I can't pass in my own variables and solve the problem that way. These functions also return a status code to the http parser, so I can't use the return values either. The suggested way to get data out of the parser for later use is to copy it out to a global variable during the callback - fun with multiple threads.)
Execute the parser on the buffered socket data. At this stage, the parser is expected to call its set up callbacks when it parses different sections of the buffer. The callback is supplied with parsed data relevant to each callback (e.g. BODY segment supplied to body_parsed callback function).
Well, this is where the problem shows. The parser has executed, but I don't have any access to the parsed data. Here is where I would add a new task to the queue with a worker function to store the received body data or another to handle the GET request for previously stored data. These functions would need to be supplied with both the parsed information (POST data or GET url) as well as the accepted socket so that the now delegated work can respond to the request and close the connection.
Of course, the obvious solution to the problem is simply to not use this thread-pool model with asynchronous practices, but I would like to know, for now and for later, how best to tackle this problem.
How can I get the parsed data from these callbacks back to the worker thread function. I've considered simply making my on_url_parsed and on_body_parsed do the rest of the application's job (storing and retrieving data), but of course I no longer have the client's socket to respond back to in these contexts.
If needed, I can post up the source code to the project when I get the chance.
Edit: It turns out that it is possible to access a user defined void * from within the callbacks of this particular http-parser library as the callbacks are passed a reference to the caller (the parser object) which has a user-definable data field.
A well-designed callback interface would provide for you to give the parser a void * which it would pass on to each of the callback functions when it calls them. The callback functions you provide know what type of object it points to (since you provide both the data pointer and the function pointers), so they can cast and properly dereference it. Among other advantages, this way you can provide for the callbacks to access a local variable of the function that initiates the parse, instead of having to rely on global variables.
If the parser library you are using does not have such a feature (and you don't want to switch to a better-designed one), then you can probably use thread-local storage instead of global variables. How exactly you would do that depends on your thread library and compiler, or you could roll your own by using thread identifiers as keys to thread-specific slots in some global data structure (a hash table for instance).

When to Use a Callback function?

When do you use a callback function? I know how they work, I have seen them in use and I have used them myself many times.
An example from the C world would be libcurl which relies on callbacks for its data retrieval.
An opposing example would be OpenSSL: Where I have used it, I use out parameters:
ret = somefunc(&target_value);
if(ret != 0)
//error case
I am wondering when to use which? Is a callback only useful for async stuff? I am currently in the processes of designing my application's API and I am wondering whether to use a callback or just an out parameter. Under the hood it will use libcurl and OpenSSL as the main libraries it builds on and the parameter "returned" is an OpenSSL data type.
I don't see any benefit of a callback over just returning. Is this only useful, if I want to process the data in any way instead of just giving it back? But then I could process the returned data. Where is the difference?
In the simplest case, the two approaches are equivalent. But if the callback can be called multiple times to process data as it arrives, then the callback approach provides greater flexibility, and this flexibility is not limited to async use cases.
libcurl is a good example: it provides an API that allows specifying a callback for all newly arrived data. The alternative, as you present it, would be to just return the data. But return it — how? If the data is collected into a memory buffer, the buffer might end up very large, and the caller might have only wanted to save it to a file, like a downloader. If the data is saved to a file whose name is returned to the caller, it might incur unnecessary IO if the caller in fact only wanted to store it in memory, like a web browser showing an image. Either approach is suboptimal if the caller wanted to process data as it streams, say to calculate a checksum, and didn't need to store it at all.
The callback approach allows the caller to decide how the individual chunks of data will be processed or assembled into a larger whole.
Callbacks are useful for asynchronous notification. When you register a callback with some API, you are expecting that callback to be run when some event occurs. Along the same vein, you can use them as an intermediate step in a data processing pipeline (similar to an 'insert' if you're familiar with the audio/recording industry).
So, to summarise, these are the two main paradigms that I have encountered and/or implemented callback schemes for:
I will tell you when data arrives or some event occurs - you use it as you see fit.
I will give you the chance to modify some data before I deal with it.
If the value can be returned immediately then yes, there is no need for a callback. As you surmised, callbacks are useful in situations wherein a value cannot be returned immediately for whatever reason (perhaps it is just a long running operation which is better performed asynchronously).
My take on this: I see it as which module has to know about which one? Let's call them Data-User and IO.
Assume you have some IO, where data comes in. The IO-Module might not even know who is interested in the data. The Data-User however knows exactly which data it needs. So the IO should provide a function like subscribe_to_incoming_data(func) and the Data-User module will subscribe to the specific data the IO-Module has. The alternative would be to change code in the IO-Module to call the Data-User. But with existing libs you definitely don't want to touch existing code that someone else has provided to you.

Why should I use thread-specific data?

Since each thread has its own stack, its private data can be put on it. For example, each thread can allocate some heap memory to hold some data structure, and use the same interface to manipulate it. Then why thread-specific data is helpful?
The only case that I can think of is that, each thread may have many kinds of private data. If we need to access the private data in any function called within that thread, we need to pass the data as arguments to all those functions, which is boring and error-prone.
Thread-local storage is a solution for avoiding global state. If data isn't shared across threads but is accessed by several functions, you can make it thread-local. No need to worry about breaking reentrancy. Makes debugging that much easier.
From a performance point of view, using thread-local data is a way of avoiding false sharing. Let's say you have two threads, one responsible for writing to a variable x, and the other responsible for reading from a variable y. If you were to define these as global variables, they could be on the same cache line. This means that if one of the threads writes to x, the CPU will update the cache line, and this of course includes the variable y, so cache performance will degrade, because there was no reason to update y.
If you used thread-local data, one thread would only store the variable x and the other would only store the variable y, thus avoiding false sharing. Bear in mind, though, that there are other ways to go about this, e.g. cache line padding.
Unlike the stack (which, like thread-local data is dedicated to each thread), thread-local data is useful because it persists through function calls (unlike stack data which may already be overwritten if used out of its function).
The alternative would be to use adjacent pieces of global data dedicated to each thread, but that has some performance implications when the CPU caches are concerned. Since different threads are likely to run on different cores, such "sharing" of a global piece of data may bring some undesirable performance degradation because an access from one core may invalidate the cache-line of another, with the latter contributing to more inter-core traffic to ensure cache consistency.
In contrast, working with thread-local data should conceptually not involve messing up with the cache of other cores.
Think of thread local storage as another kind of global variable. It's global in the sense that you don't have to pass it around, different code can access it as they please (given the declaration of course). However, each different thread has its own separate variable. Normally, globals are extra bad in multithreaded programming bacause other threads can change the value. If you make it thread local, only your thread can see it so it is impossible for another thread to unexpectedly change it.
Another use case is when you are forced to use a (badly designed) API that expects you to use global variables to carry information to callback functions. This is a simple instance of being forced into a global variable, but using thread local storage to make it thread safe.
Well, I've been writing multithreaded apps for 30 odd years and have never, ever found any need to use TLS. If a thread needs a DB connection that the DB binds to the thread, the thread can open one of its own and keep it on the stack. Since threads cannot be called, only signaled, there is no problem. Every time I've ever looked at this magic 'TLS', I've realized it's not a solution to my problem.
With my typical message-passing design, where objects are queued in to threads that never terminate, there is just no need for TLS.
With thread-pools it's even more useless.
I can only say that using TLS=bad design. Someone put me right, if you can :)
I've used thread local storage for database connections and sometimes for request / response objects. To give two examples, both from a Java webapp environment, but the principles hold.
A web app might consist of a large spread of code that calls various subsystems. Many of these might need to access the database. In my case, I had written each subsystem that required the db to get a db connection from a db pool, use the connection, and return the connection to the pool. Thread-local storage provided a simpler alternative: when the request is created, fetch a db connection from the pool and store it in thread-local storage. Each subsystem then just uses the db connection from thread-local storage, and when the request is completing, it returns the connection to the db pool. This solution had performance advantages, while also not requiring me to pass the db connection through every level: ie my parameter lists remained shorter.
In the same web app, I decided in one remote subsystem that I actually wanted to see the web Request object. So I had either to refactor to pass this object all the way down, which would have involved a lot of parameter passing and refactoring, or I could simply place the object into Thread Local storage, and retrieve it when I wanted it.
In both cases, you could argue that I had messed up the design in the first place, and was just using Thread Local storage to save my bacon. You might have a point. But I could also argue that Thread Local made for cleaner code, while remaining thread-safe.
Of course, I had to be very sure that the things I was putting into Thread Local were indeed one-and-only-one per thread. In the case of a web app, the Request object or a database connection fit this description nicely.
I would like to add on the above answers, that as far as I know, performance wise, allocation on stack is faster than allocation on heap.
Regarding passing the local data across calls , well - if you allocate on heap, you will need to pass the pointer / reference (I'm a Java guy :) ) to the calls - otherwise, how will you access the memory?
TLS is also good in order to store a context for passing data across calls within a thread (We use it to hold information on a logged on user across the thread - some sort of session management).
Thread Specific Data is used when all the functions of a particular thread needs to access one common variable. This variable is local to that particular thread but acts as a global variable for all the functions of that thread.
Let's say we have two threads t1 and t2 of any process. Variable 'a' is the thread specific data for t1. Then, t2 has no knowledge over 'a' but all the functions of t1 can access 'a' as a global variable. Any change in 'a' will be seen by all the functions of t1.
With new OOP techniques available, I find thread specific data as irrelevant. Instead of passing the function to the thread, you can pass the functor. The functor class that you pass, can hold any thread specific data that you need.
Eg. Sample code with C++11 or boost would like like below
MyClassFunctor functorobj; <-- Functor Object. Can hold the function that runs as part of thread as well as any thread specific data
boost::thread mythread(functorobj);
Class MyClassFunctor
{
private:
std::list mylist; <-- Thread specific data
public:
void operator () ()
{
// This function is called when the thread runs
// This can access thread specific data mylist.
}
};

Resources