Im writing an Erlang C NIF that will only be used by one Erlang process. I want to create a struct that will hold an array of pointers. I need this to exist in between the process' calls to the NIF.
What I need insight to is the proper way to do this approach from the Erlang NIF side of things. Im thinking of writing a struct outside of all the functions so its accessible to all. When I create it in one call to the NIF, and then come back and use it with another call to the NIF, it seems to work just fine.
Im worried that this could be because the process is staying local to the scheduling thread and therefore does not have to move the struct and underlying array in memory.
Should I be using erlang:memalloc from within a function and avoiding globals all together or, staying as is with global structs?
Possibly return a pointer to a single array containing all my data?
You could certainly return a pointer to a single array containing your data; to do that, look at ErlNifResourceType. You would pass this back to the calling erlang process, and it in turn would pass it back to you on subsequent NIF calls. This would ensure that only one thread was operating on your data at a time (assuming only one process had a copy of the resource; it's not something you want to share, especially if it contains pointers).
You could also encode it as an erlang list, but that would probably be very inefficient.
That being said, you can use shared memory from a NIF. For example, here's an ets-like database implemented as a NIF using shared data.
You just have to keep in mind that you're accessing shared resources. The NIF API provides thread creation, thread specific data, mutexes, conditions, and read/write locks. You can even send a message to an erlang process from a NIF-created thread (in the event of a long-running NIF call, this is actually how you'd want to implement it to prevent scheduling problems).
Given your requirements, you're probably better off using the ErlNifResource type rather than messing with multithreading and shared resource controls. Technically if you're only using one erlang process you could leave it as a global variable (read: shared resource) without any harmful side effects. That being said, things change, and you don't want to be the cause of someone's headache down the road when they try to use your code from multiple processes. Whichever method you wind up using, make sure it's thread safe.
Related
I'm writing an application running on FreeRTOS where I have different threads that all have to access (read and some write) the same data structures.
So I thought I could implement a global data store holding all the data in some grouped structs. I also thought about using something like SQLite but I think that's an overkill for my application.
Nevertheless there are some open questions
To guarantee thread safety I assume I need to add a semaphore for my read and write access but...
If a thread only has to update certain elements of a struct I'd need to hand it a pointer to that struct, but as soon as I start using pointer references I can no longer protect my access with a semaphore. So how can I allow a thread to modify single struct elements without violating thread safety?
Is there a better way to have something like a global store than the one I planned to use? Google didn't give much hints.
Let me share with you the most common C-way to handle this type of scenarios.
In order to avoid exposing a lot of global data, use opaque pointers and have a list of APIs that take the opaque pointer as a parameter. In this way you can have a clean set of APIs to access all of your structures leaving the implementation specifics just in the .c file. The good thing about using opaque pointers is that no one knows how to deal with the type pointed-to-data except for the APIs that takes in opaque pointer as a parameter. This provides protection & encapsulation to your structures, so that they are ever accessed or mutated with the APIs that you provide.
In each of the structure definitions identified by the opaque pointers, have a mutex (FreeRTOS has mutex implemented in-terms of semaphores) to access the structure members in a thread-safe manner. You have to use xSemaphoreCreateMutex when creating the mutex. In all your APIs use xSemaphoreGive & xSemaphoreTake when you want to access the member data that are potentially accessed by multiple tasks.
Is it possible to launch two completely independent programs into one scope of memory area?
For example, I have skype.exe and opera.exe and I want to launch them on a way that will allows them to share common memory. Sounds like threading to me.
These are quite some questions at the same time, let me try to dissect:
It is the definition of a process on a modern OS to have its own virtual address space. So running two processes in the same address space can't happen without a modification to the OS to allow exactly that.
Even if such a modification were available, it would be a less than perfect idea: Access to memory shared between threads is governed by synchronisation primitives explicitly built into them. There is no such mechanism to manage memory access between two processes, that have not explicitly been designed so
Sharing memory if so designed between processes does not at all need them to run in the same virtual address space in their totality: Shared memory segments exist in virtually all modern OS to facilitate exactly that. Again, those processes have to be explicitly designed to use this feature.
If they are two independent programs running then you have to ensure that the data is passed in an independent way between them. Let's say the two programs are running, the first program compute some data that the second program needs. The simplest thing to do is print the data from the first program into a file with some status at the end of the file (to indicate that it is safe for the other program to start reading it). From the other program you have a while loop that checks the status of the last line in that file every period of time.
The other option is to use some library like MPI which has protocols for message passing implemented.
I have two types of threads, one student the other librarian. Also I have a list of struct which holds the basic info like book name, ISBN, publishing year regarding to each books.(which is a shared resource between threads) I want to pass the pointer of a certain book in a student thread/routine to a librarian thread using condition variables. (so that a librarian could reserve the book for the student by means of signaling) How can I accomplish this is or is this even the right way to go about it?
The easiest way is to use pipes man 2 pipe.
Performance wise faster, but far more complicated ways are to use a virtual ring buffer man 3 vrb (userland pipe) or any other message passing middleware.
If these are threads (using pthread library) in the same process, you can share data since the address space is common to them. However, be aware of synchronization issues.
A common way to do that is to use a mutex for every (read or write) access to that common data. Perhaps also use condition variables for synchronization (i.e. thread A needing to tell thread B that something significant changed).
Read a good pthread tutorial (and this perhaps also).
is this even the right way to go about it?
Your example is very artificial... the only reason why you would use threads and some strange local variable list for this, is because some teacher tells you to do so. So no, this is not the right way to implement a program to be used in the real world.
In the real world, things like these would almost certainly be implemented through a database, where the DBMS handles the accessing of individual posts. Most likely in some kind of client/server system, where there is a client used by the librarian. I don't see why the student would even be part of the system, except as a data post over who borrowed the book.
I have a C program that currently uses multiple threads to process data. I use a glib GAsyncQueue for the producer threads to send their data to consumer threads. Now I need to move the threads into into independent processes and I'm not sure how to proceed with pushing data between them. Using pipes does not seem to be very suitable to my task since the amount data being pushed is rather large. Another option is to obtain a piece of shared memory but, since calculating an upper bound on the amount of shared data is a little difficult, this option is less than attractive.
Do you know of something like GAsyncQueue that can be used with multiple processes? Since I'm already using glib, I prefer to use its facilities, but I'm open to using other libraries if they provide what I need.
POSIX specifies a msgsnd(2), msgget(2) interface, though the message and queue sizes may be smaller than you wish. (Linux allows you to modify the sizes with the /proc/sys/kernel/msgmax and /proc/sys/kernel/msgmnb tunable files; the defaults are 8k and 16k.)
Since message buses are a fairly common need you may wish to pick something like RabbitMQ, which provides prewritten bindings to many languages and may make future development easier.
all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.