Can not calling get() on pathos apipe() result lead to excessive memory usage? - pathos

If you use a pathos ThreadPool and call apipe() a large number of times, but never call get() on the result, does this possibly lead to pathos internally accumulating state (and thus using up memory)?

Without details on the version of python, pathos, etc you are interested in... I can give you a general answer. Calling apipe from a ThreadPool instance generates a ApplyResult object. Calling get() on the instance of an ApplyResult object will return the result corresponding to the inputs to apipe being evaluated. So, what happens before you call get? The exact details depend on the version of python and pathos, but generally, when you call apipe, an ApplyResult instance is generated, and the evaluation of the inputs to apipe are dispatched to a new thread. The ApplyResult instance has a pointer either back to the pool, to a queue, or to some other object that will 'receive the result' when get is called. The evaluated result produced in the new thread is stored in memory until get is called... or until the object that holds the result is deallocated from memory (however that happens). This memory storage most likely happens in an object in the threading module... but it may be elsewhere, depending on the version of python, pathos, etc. Were you to delete the appropriate object, and dealloc it from memory, then you'd never need to call get to release the memory. However, the short answer is "likely yes", not calling get will fill up your memory unless the result, the return object, and/or other objects with pointer refs to the result are deleted.

Related

cgo Interacting with C Library that uses Thread Local Storage

I'm in the midst of wrapping a C library with cgo to be usable by normal Go code.
My problem is that I'd like to propagate error strings up to the Go API, but the C library in question makes error strings available via thread-local storage; there's a global get_error() call that returns a pointer to thread local character data.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
The problem that I foresee here is that TLS in C works on the level of native OS threads, but in my understanding, the calling Go code will be coming from one of potentially N goroutines that are multiplexed across some number of underlying native threads in a thread pool managed by the Go scheduler.
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. When the original goroutine gets swapped back in, it could potentially be on a different native thread for all I know, but even if it gets swapped back onto the same thread, any goroutines that ran there in the intervening time could've changed the state of the TLS, causing me to load an error string for an unrelated call.
My questions are these:
Is this a reasonable concern? Am I misunderstanding something about the go scheduler, or the way it interacts with cgo, that would cause this to not be an issue?
If this is a reasonable concern, how can I work around it?
cgo somehow manages to propagate errno values back to the calling Go code, which are also stored in TLS, which makes me think there must be a safe way to do this.
I can't think of a way that the C code itself could get preempted by the go scheduler, so should I introduce a wrapper C function and have IT make the necessary call and then conditionally copy the error string before returning back up to goland?
I'm interested in any solution that would allow me to propagate the error strings out to the rest of Go, but I'm hoping to avoid any solution that would require me to serialize accesses around the TLS, as adding a lock just to grab an error string seems greatly unfortunate to me.
Thanks in advance!
What I'm afraid of is running into a situation where I call into the C routine, then after the C routine returns, but before I copy the error string, the Go scheduler decides to swap the current goroutine out for another one. ...
Is this a reasonable concern?
Yes. The cgo "call C code" wrappers lock on to one POSIX / OS thread for the duration of each call, but the thread they lock is not fixed for all time; it does in fact bop around, as it were, to multiple different threads over time, as long as your goroutines are operating normally. (Since Go is cooperatively scheduled in the current implementations, you can, in some circumstances, be careful not to do anything that might let you switch underlying OS threads, but this is probably not a good plan.)
You can use runtime.LockOSThread here, but I think the best plan is otherwise:
how can I work around it?
Grab the error before Go resumes its normal scheduling algorithm (i.e., before unlocking the goroutine from the C / POSIX thread).
cgo somehow manages to propagate errno values ...
It grabs the errno value before unlocking the goroutine from the POSIX thread.
My original plan was to call into C via cgo, check if the call returned an error, and if so, wrap the error string using C.GoString to convert it from a raw character pointer into a Go string. It'd look something like C.GoString(C.get_error()).
If there is a variant of this that takes the error number (rather than fishing it out of a TLS variable), that plan should still work: just make sure that your C routines provide both the return value and the error number.
If not, write your own C wrapper, just as you suggested:
ftype wrapper_for_realfunc(char **errp, arg1type arg1, arg2type arg2) {
ftype ret = realfunc(arg1, arg2);
if IS_ERROR(ret) {
*errp = get_error();
} else {
*errp = NULL;
}
return ret;
}
Now your Go wrapper simply calls the wrapper, which fills in a pointer to C memory with an extra *C.char argument, setting it to nil if there is no error, and setting it to something on which you can use C.GoString if there is an error.
If that's not feasible for some reason, consider using runtime.LockOSThread and its counterpart, runtime.UnlockOSThread.

In C, how do I make sure that a memory load is performed only once?

I am programming two processes that communicate by posting messages to each other in a segment of shared memory. Although the messages are not accessed atomically, synchronization is achieved by protecting the messages with shared atomic objects accessed with store-releases and load-acquires.
My problem is about security. The processes do not trust each other. Upon receiving a message, a process makes no assumption about the message being well formed; it first copies the message from shared memory to private memory, then performs some validation on this private copy and, if valid, proceeds to handle this same private copy. Making this private copy is crucial, as it prevents a TOC/TOU attack in which the other process would modify the message between validation and use.
My question is the following: does the standard guarantee that a clever C compiler will never decide that it can read the original instead of the copy? Imagine the following scenario, in which the message is a simple integer:
int private = *pshared; // pshared points to the message in shared memory
...
if (is_valid(private)) {
...
handle(private);
}
If the compiler runs out of registers and temporarily needs to spill private, could it decide, instead of spilling it to the stack, that it can simply discard its value and reload it from *pshared later, provided that an alias analysis ensures that this thread has not changed *pshared?
My guess is that such a compiler optimization would not preserve the semantics of the source program, and would therefore be illegal: pshared does not point to an object that is provably reachable from this thread only (such as an object allocated on the stack whose address has not leaked), therefore the compiler cannot rule out that some other thread might concurrently modify *pshared. By constrast, the compiler may eliminate redundant loads, because one of the possible behaviors is that no other thread runs between the redundant loads, therefore the current thread must be ready to deal with this particular behavior.
Could anyone confirm or infirm that guess and possibly provide references to the relevant parts of the standard?
(By the way: I assume that the message type has no trap representations, so that loads are always defined.)
UPDATE
Several posters have commented on the need for synchronization, which I did not intend to get into, since I believe that I already have this covered. But since people are pointing that out, it is only fair that I provide more details.
I am implementing a low-level asynchronous communication system between two entities that do not trust each other. I run tests with processes, but will eventually move to virtual machines on top of a hypervisor. I have two basic ingredients at my disposal: shared memory and a notification mechanism (typically, injecting an IRQ into the other virtual machine).
I have implemented a generic circular buffer structure with which the communicating entities can produce messages, then send the aforementioned notifications to let each other know when there is something to consume. Each entity maintains its own private state that tracks what it has produced/consumed, and there is a shared state in shared memory composed of message slots and atomic integers tracking the bounds of the regions holding pending messages. The protocol unambiguously identifies which message slots are to be exclusively accessed by which entity at any time. When it needs to produce a message, an entity writes a message (non atomically) to the appropriate slot, then performs an atomic store-release to the appropriate atomic integer to transfer the ownership of the slot to the other entity, then waits until memory writes have completed, then sends a notification to wake up the other entity. Upon receiving a notification, the other entity is expected to perform an atomic load-acquire on the appropriate atomic integer, determine how many pending messages there are, then consume them.
The load of *pshared in my code snippet is just an example of what consuming a trivial (int) message looks like. In a realistic setting, the message would be a structure. Consuming a message does not need any particular atomicity or synchronization, since, as specified by the protocol, it only happens when the consuming entity has synchronized with the other one and knows that it owns the message slot. As long as both entites follow the protocol, everything works flawlessly.
Now, I do not want the entites to have to trust each other. Their implementation must be robust against a malicious entity that would disregard the protocol and write all over the shared memory segment at any time. If this were to happen, the only thing the malicious entity should be able to achieve would be to disrupt the communication. Think of a typical server, that must be prepared to handle ill-formed requests by a malicious client, without letting such misbehavior cause buffer overflows or out-of-bound accesses.
So, while the protocol relies on synchronization for normal operation, the entities must be prepared for the contents of shared memory to change at any time. All I need is a way to make sure that, after an entity makes a private copy of a message, it validates and uses that same copy, and never accesses the original any more.
I have an implementation that copies the message using a volatile read, thus making it clear to the compiler that the shared memory does not have ordinary memory semantics. I believe that this is sufficient; I wonder whether it is necessary.
You should inform the compiler the the shared memory can change at any moment by the volatile modifier.
volatile int *pshared;
...
int private = *pshared; // pshared points to the message in shared memory
...
if (is_valid(private)) {
...
handle(private);
}
As *pshared is declared to be volatile, the compiler can no longer assume that *pshared and private keep same value.
Per your edit, it is now clear, that we all know that a volatile modifier on the shared memory is sufficient to guarantee that the compiler will honour the temporality of all accesses to that shared memory.
Anyway, draft N1256 for C99 is explicit about it in 5.1.2.3 Program execution (emphasize mine)
2 Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment. Evaluation of an expression may produce side effects. At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place.
5 The least requirements on a conforming implementation are:
— At sequence points, volatile objects are stable in the sense that previous accesses are
complete and subsequent accesses have not yet occurred
— At program termination, all data written into files shall be identical to the result that
execution of the program according to the abstract semantics would have produced.
That let think that even if pshared is not qualified as volatile, private value must have been loaded from *pshared before the evaluation of is_valid, and as the abstract machine has no reason to change it before the evaluation of handle, a conformant implementation should not change it. At most it could remove the call to handle if it contained no side-effects which is unlikely to happen
Anyway, this is only an academic discussion, because I cannot imagine a real use case where share memory could not need the volatile modifier. If you do not use it, the compiler is free to believe that the previous value is still valid, so on second access, you will still get first value. So even if the answer to this question is it is not necessary, you still have to use volatile int *pshared;.
It's hard to answer your question as posted. Note that you must use a synchronization object to prevent concurrent accesses, unless you are only reading units which are atomic on the platform.
I am assuming that you intend to ask about (pseudocode):
lock_shared_area();
int private = *pshared;
unlock_shared_area();
if (is_valid(private))
and that the other process also uses the same lock. (If not, it would be good to update your question to be a bit more specific about your synchronization).
This code guarantees to read *pshared at most once. Using the name private means to read the variable private, not the object *pshared. The compiler "knows" that the call to unlock the area acts as a memory fence and it won't reorder operations past the fence.
Since the C doesn't have any concept of interprocess communication there is nothing you can do to inform the compiler that there is another process that might be modifying the memory.
Thus, I believe there is no way to prevent a sufficiently clever, malicious, but conforming build system from invoking the "as if" rule to allow it to do the Wrong Thing.
To get something that is 'guaranteed' to work, you need to work whatever guarantees are given by your specific compiler and/or shared memory library you're using.

Can you set local variables through system calls?

I have one program that forks other programs. When the forked programs receive SIGUSR2, a variable in them is supposed to change. I'm not sure how to do that because that variable isn't in the scope of the function that SIGUSR2 calls.
In C, a function can not see/manipulate the value of a variable local to another function (ignoring the possibility of a visible pointer pointing to a local variable which is either static or in an active call frame).
Your question's setup is not very clear, but to answer generically (and perhaps a bit pedantically), code does not change variables, code changes memory.
That is... a variable is simply a convenient way to refer to a memory location. "Changing a variable" is really just changing the value at its position in memory. This is relevant because while it's very convenient to execute x = 5;, that's not the only way to change x. Any code that knows where x lives in memory, and has permission to write to that location, may therefore change x.
In your specific case you're starting a second process. Initially this second process has a copy of the memory of the first one, letting it read the same data, but it's typical that any post-fork changes to memory are only visible in the process that made the change.
Your wording suggests that you're not only calling fork(), but that you may also then be exec'ing to another program altogether... making even the copy of the parent's memory go away.
In short, what you're trying to do is probably not possible without going through some rather ugly hacks, and it would definitely be worth finding a different solution (such as shared memory).

Detect when memory address is being written to

I have an array of bytes that is used as an emulated system RAM. I want to make a bullet-proof patch for a given cell, that detects when it's being written to, and overwrites it instantly. Using a loop like
for (;;) {
address = x;
sleep(y);
}
has a flaw that there's a minimum possible value for sleep, which appears to be nearly identical to the emulated frame length, so it'd only patch the address once per frame. So, if it's written to 100 times per frame by a game, such a patch will make little sense.
I have some hooks on writing, but those only catch writes by reading the game's code being executed, while I want to make such patches work for any memory region, not just RAM, hence I can't rely on interpreting the emulated code too much (it simply doesn't match for all regions I want to patch).
So I need some pragrammatical watchpoint, having a pointer to the array, and a byte I want to watch change.
Although C is not an object-oriented language, I would use an object-oriented approach here:
Wrap the emulated memory up in an opaque pointer that can only be read and written to with a specific set of functions (e.g. memory_write_byte and memory_read_byte).
Make the memory object maintain a list of function pointers that point to callback functions for handling write events. Whenever a write happens, make it call all those callbacks.
The part of the code that wants to monitor that spot in memory can register a callback with the memory object, and whenever the callback gets called it can modify the memory if needed.
I'd look into shared memory ala mmap. Using mmap you can have the same page shared by two processes and one of the processes can be read only.
When a write on this memory region occurs a SIGSEGV would be generated, which you can catch, and then take some sort of an action. This is using UNIX terminology, but you can do the same thing with windows it is just slightly more involved.

Why thread specific data is required in pthread?

all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.

Resources