What is a re entrant procedure and can you give an example scenario of when it is used?
Edit: Also, can multiple processes access a re entrant procedure in parallel?
Please provide a different way of explaining than wikipedia as I don't totally understand their description hence my question here
The idea behind re-entrancy is that the routine may be called while it is in the middle of executing already and it will still work right.
Generally this is achieved by it using only parameters and local variables declared on the stack (in C terms, no static locals). It would also be important that it not lock any global resources during execution.
Now, you may ask, "How would such a weird thing as a routine being run multiple times at once happen?" Well, some ways this could happen are:
The routine is recursive (or mutually-recursive with some other set of routines).
It gets called by another thread.
It gets called by an interrupt.
If any of these happen, and the routine is modifying a global (or C static local), then the new execution could potentially wipe out the changes the first execution made. As an example, if that global was used as a loop control variable, it might cause the first execution, when it finally gets to resume, to loop the wrong number of times.
It is a subroutine which can be called when it is already active. For instance recursive functions are often reentrant. Functions which are called from signal handlers must be reentrant as well. A reentrant function is thread-safe but not all thread-safe one are reentrant.
A reentrant procedure is one in which a single copy of the program code can be shared by multiple users during the same period of time. Re entrance has two key aspects: The program code cannot modify itself and the local data for each user must be stored separately.
In a shared system, reentrancy allows more efficient use of main memory: One copy of the program code is kept in main memory, but more than one application can call the procedure. Thus, a reentrant procedure must have a permanent part( the instructions that make up the procedure) and a temporary part(a pointer back to the calling program as well as memory for local variables used by the program).
Each execution instance, called activation, of a procedure will execute the code in the permanent part but must have its own copy of local variables and parameters. The temporary part associated with a particular activation is referred to as an activation record.
The most convenient way to support reentrant procedures is by means of a stack. When a reentrant procedure is called, the activation record becomes part of the stack frame that is created on procedure call
Related
I'm in the midst of wrapping a C library with cgo to be usable by normal Go code.
My problem is that I'd like to propagate error strings up to the Go API, but the C library in question makes error strings available via thread-local storage; there's a global get_error() call that returns a pointer to thread local character data.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
The problem that I foresee here is that TLS in C works on the level of native OS threads, but in my understanding, the calling Go code will be coming from one of potentially N goroutines that are multiplexed across some number of underlying native threads in a thread pool managed by the Go scheduler.
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. When the original goroutine gets swapped back in, it could potentially be on a different native thread for all I know, but even if it gets swapped back onto the same thread, any goroutines that ran there in the intervening time could've changed the state of the TLS, causing me to load an error string for an unrelated call.
My questions are these:
Is this a reasonable concern? Am I misunderstanding something about the go scheduler, or the way it interacts with cgo, that would cause this to not be an issue?
If this is a reasonable concern, how can I work around it?
cgo somehow manages to propagate errno values back to the calling Go code, which are also stored in TLS, which makes me think there must be a safe way to do this.
I can't think of a way that the C code itself could get preempted by the go scheduler, so should I introduce a wrapper C function and have IT make the necessary call and then conditionally copy the error string before returning back up to goland?
I'm interested in any solution that would allow me to propagate the error strings out to the rest of Go, but I'm hoping to avoid any solution that would require me to serialize accesses around the TLS, as adding a lock just to grab an error string seems greatly unfortunate to me.
Thanks in advance!
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. ...
Is this a reasonable concern?
Yes. The cgo "call C code" wrappers lock on to one POSIX / OS thread for the duration of each call, but the thread they lock is not fixed for all time; it does in fact bop around, as it were, to multiple different threads over time, as long as your goroutines are operating normally. (Since Go is cooperatively scheduled in the current implementations, you can, in some circumstances, be careful not to do anything that might let you switch underlying OS threads, but this is probably not a good plan.)
You can use runtime.LockOSThread here, but I think the best plan is otherwise:
how can I work around it?
Grab the error before Go resumes its normal scheduling algorithm (i.e., before unlocking the goroutine from the C / POSIX thread).
cgo somehow manages to propagate errno values ...
It grabs the errno value before unlocking the goroutine from the POSIX thread.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
If there is a variant of this that takes the error number (rather than fishing it out of a TLS variable), that plan should still work: just make sure that your C routines provide both the return value and the error number.
If not, write your own C wrapper, just as you suggested:
ftype wrapper_for_realfunc(char **errp, arg1type arg1, arg2type arg2) {
ftype ret = realfunc(arg1, arg2);
if IS_ERROR(ret) {
*errp = get_error();
} else {
*errp = NULL;
}
return ret;
}
Now your Go wrapper simply calls the wrapper, which fills in a pointer to C memory with an extra *C.char argument, setting it to nil if there is no error, and setting it to something on which you can use C.GoString if there is an error.
If that's not feasible for some reason, consider using runtime.LockOSThread and its counterpart, runtime.UnlockOSThread.
is there a way of manipulating the stack from a timer ISR? So i can just throw away the highest frame of the stack by forcing a long-running function to exit? (I am aware of loosing the heap-allocated memory in this case)
The target would probably be an ARM CPU.
Best Regards
Looks like you want something like setjmp/longjmp with longjmp called after ISR termination.
It is possible alter ISR return address such a way, that instead of returning to long-running function longjmp will be called with right parameters and long-running function will be aborted to the place where setjmp was called.
Just another solution came in mind. May be it would be easier to restore all the registers (Stack pointer, PC, LR and others) to values they have before long-running functions was called in the ISR stack frame (using assembly). In order to do that you need to save all required values (using assembly) before long-running functions.
I would recommend avoiding a long-running function. While it may work in the short term, as your code grows it could become problematic.
Instead, consider using a state machine, or system of state machines in your master loop, and using your ISR for a flag. This will reduce timing issues and allow you to manage more tasks at once.
That's possible in theory, but probably impossible to do reliably.
You could use GCC builtins __builtin_frame_address and __builtin_return_address to correctly restore the stack and return from the previous function, but it will corrupt the program behavior. The function you forcibly return probably saved some registers on the stack, and needed to restore them before returning. The problem is, there is no way I know of to locate or mimic the restore code. It is certainly located just before the function returns (and you can't even know where this is), but it could be 1, 2, or even 0 instructions. And even if you locate it or mimic it, you can't really hardcode it, because it is likely to change when you change the function.
In conclusion, you may be able if you use some builtins and 2-3 inline assembly instructions, but you need to tailor-hardcode it for the function you want to return, and you have to change it whenever you change the function.
Why cant you just set a flag in your isr that your function will periodically check to see if it needs to exit? The reason I disapprove of the way you are trying to do it is because it is extremely dangerous to "kill" a function when it is in the middle of some operation. Unless you have a way to clean up absolutely everything after it (like when killing a process) there is no other way you can do it safely. It is always better to signal the function through a flag or semaphore of some kind from isr and then let that function cleanup after itself and exit normally.
I have one program that forks other programs. When the forked programs receive SIGUSR2, a variable in them is supposed to change. I'm not sure how to do that because that variable isn't in the scope of the function that SIGUSR2 calls.
In C, a function can not see/manipulate the value of a variable local to another function (ignoring the possibility of a visible pointer pointing to a local variable which is either static or in an active call frame).
Your question's setup is not very clear, but to answer generically (and perhaps a bit pedantically), code does not change variables, code changes memory.
That is... a variable is simply a convenient way to refer to a memory location. "Changing a variable" is really just changing the value at its position in memory. This is relevant because while it's very convenient to execute x = 5;, that's not the only way to change x. Any code that knows where x lives in memory, and has permission to write to that location, may therefore change x.
In your specific case you're starting a second process. Initially this second process has a copy of the memory of the first one, letting it read the same data, but it's typical that any post-fork changes to memory are only visible in the process that made the change.
Your wording suggests that you're not only calling fork(), but that you may also then be exec'ing to another program altogether... making even the copy of the parent's memory go away.
In short, what you're trying to do is probably not possible without going through some rather ugly hacks, and it would definitely be worth finding a different solution (such as shared memory).
I know that declaring a static variable within a function in C means that this variable retains its state between function invocations. In the context of threads, will this result in the variable retaining its state over multiple threads, or having a separate state between each thread?
Here is a past paper exam question I am struggling to answer:
The following C function is intended to be used to allocate unique identifiers (UIDs) to its callers:
get_uid()
{
static int i = 0;
return i++;
}
Explain in what way get_uid() might work incorrectly in an environment where it is being called by multiple threads. Using a
specific example scenario, give specific detail on why and how such
incorrect behaviour might occur.
At the moment I am assuming that each thread has a separate state for the variable, but I am not sure if that is correct or if the answer is more to do with the lack of mutual exclusion. If that is the case then how could semaphores be implemented in this example?
Your assumption (threads have their own copy) is not correct. The main problem with code is when multiple threads call that function get_uid(), there's a possible race condition as to which threads increments i and gets the ID which may not be unique.
All the threads of a process share the same address space. Since i is a static variable, it has a fixed address. Its "state" is just the content of the memory at that address, which is shared by all the threads.
The postfix ++ operator increments its argument and yields the value of the argument before the increment. The order in which these are done is not defined. One possible implementation is
copy i to R1
copy R1 to R2
increment R2
copy R2 to i
return R1
If more than one thread is running, they can both be executing these instructions simultaneously or interspersed. Work out for yourself sequences where various results obtain. (Note that each thread does have its own register state, even for threads running on the same CPU, because registers are saved and restored when threads are switched.)
A situation like this where there are different results depending on the indeterministic ordering of operations in different threads is called a race condition, because there's a "race" among the different threads as to which one does which operation first.
No, if you want a variable which value depends upon the thread in which it is used, you should have a look at Thread Local Storage.
A static variable, you can imagine it really like a completely global variable. It's really much the same. So it's shared by the whole system that knows its address.
EDIT: also as a comment reminds it, if you keep this implementation as a static variable, race conditions could make that the value i is incremented at the same time by several threads, meaning that you don't have any idea of the value which will be returned by the function calls. In such cases, you should protect access by so called synchronization objects like mutexes or critical sections.
Since this looks like homework, I'll answer only part of this and that is each thread will share the same copy of i. IOW, threads do not get their own copies. I'll leave the mutual exclusion bit to you.
Each thread will share the same static variable which is mostly likely a global variable. The scenario where some threads can have wrong value is the race condition (increment isn't done in one single execution rather it is done in 3 assembly instructions, load, increment, store). Read here and the diagram at the link explains it well.
Race Condition
If you are using gcc you can use the atomic builtin functions. I'm not sure what is available for other compilers.
int get_uid()
{
static int i = 0;
return __atomic_fetch_add(&i, 1, __ATOMIC_SEQ_CST);
}
This will ensure that the variable cannot be acted on by more than one thread at a time.
all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.