Creating queue in share memory POSIX - c

For my implementation I am using mmap for allocating shared memory for interprocess communication. In this shared memory I initialize a queue (I set first and last pointer to NULL).
The problem is how to push new item into the Queue. Normally I would use a malloc to allocate my 'queue item struct' and then point to it, but I can't use that, can I? I need to allocate it somehow in the shared memory. I probably could use another mmap and push the item there and then point to it, but it doesn't seem right, because I would have to do that multiple times.
Can this be done simply or I should think about different solutions?
Thanks for any ideas.

General rules to create a queue in shared memory:
1) Never use pointers as shared elements, because the OS may choose different virtual addresses in different processes. Always use offsets from the shared memory view base address, or array indices, or anyway something that is position-independent.
2) You have to manually partition your shared memory. E.g. you must know how many items your queue may contain, and dimension the shared area so it can contain the "haeder" (insertion index and extraction index...) and the item array. It's often enough to define a structure that contains both the "header" and the "item array" of the correct size: the memory size is sizeof(your_structure), its address is the one returned by mmap.
3) Carefully consider multithreading and multiprocessing issues. Protect the access to the shared memory with a mutex if it's acceptable that the accessing threads may block. But if you want to create a "non-blocking" queue, you must at least use atomic operations to change the relevant fields, and consider any possible timing issue.
Regards

Related

How to store nodes of a list into shared memory

I am trying to make many clients communicate with each other via many terminals.I have forks inside my program and I create pipes so the clients can read/write from/to other clients.Because I create many processes I need shared memory to store some infos and in particular i want to store nodes that are created from each kid.How can I do this?
This is my struct:
typedef struct client{
char *numofclient;
struct client *nextclient;
}client;
Before forking anything create a shared memory area using mmap. Read the man page and use the shared flags. If on Windows it's different so look up VirtualAlloc and of course you can't fork.
You'll need a memory allocator for your shared memory. That can be super easy: just increment a char pointer for an allocation and never free anything. Or it can be as complex as you want. You may be able to find a library online.
If you need a mutex create one in the shared memory area and be sure to use the flags for a shared mutex.
Since you are forking you can use pointers because the shared memory will remain mapped in place in each process copy. Otherwise you'd need to use offsets from the map start.
I think you can make the systemV shared memory using shmget, read man pages.
You can decide an upper limit of how many Process are going to be created and provide that much size accordingly to shmget.
So, whenever you child process wants to store list it can just attach to the shared memory and append its data in shared memory.

How to share a structure with pointers between two unrelated processes with shared memory in C?

I have a structure which looks like:
typedef struct shared_data_t
{
char *key;
char *message;
}shared_data;
I need to share this structure with another unrelated process. I am using POSIX shared memory with shm_open()/mmap() to achieve this. However, my target process is not getting the shared data and its dieing with SIGSEGV, which is obvious. It will be great if someone help me on this, specially what happens while sharing pointers between two processes with shared memory (with shm_open and mmap).
For a structure like,
typedef struct shared_data_t
{
char key[8];
char message[32];
}shared_data;
it works all fine!
There is a note about this in the Linux man page for shmat:
Using shmat() with shmaddr equal to NULL is the preferred, portable way
of attaching a shared memory segment. Be aware that the shared memory
segment attached in this way may be attached at different addresses in
different processes. Therefore, any pointers maintained within the
shared memory must be made relative (typically to the starting address
of the segment), rather than absolute.
The struct contains two pointers, for key and message. They have values such as 0x1000 and 0x1040. Your first process maps shared memory, say at address 0x7000. It copies the struct into shared memory. The second process maps the same shared memory, say at address 0x9000. It reads the struct. Then it uses the pointers, which causes it to look for key and message at addresses 0x1000 and 0x1040. But they are not at those addresses in the memory of the second process. So the second process fails.
To fix this, you must arrange for key and message to be in shared memory, and you must either arrange for them to be at the same address in both processes (by telling mmap exactly where you want to map memory, not letting the system pick the address) or you must include information in the shared memory about how to locate key and message. This is often done by using offsets instead of pointers. That is, instead of having pointers to char in the struct, have offsets (possibly with type ptrdiff_t) that give the number of bytes from a base location to the key and the message. The beginning of the shared memory segment is a typical base to use.
If you have only one key and one message to shared, then a common way this is done is simply to use a single data structure for the shared memory, as you showed with your second definition of shared_data: The key and the message are part of the struct, so their offsets are known, simply as offsets from the beginning of the struct. If you are sharing more complicated data, such as trees or linked lists, then you may need to use explicit offsets.

CUDA shared memory not faster than global?

Hi i have kernel function, where i need to compare bytes. Area where i want to search is divided into blocks, so array of 4k bytes is divided to 4k/256 = 16 blocks. Each thread in block reads array on idx and compare it with another array, where is what i want to search. I've done this by two ways:
1.Compare data in global memory, but often threads in block need to read the same address.
2.Copy data from global memory to shared memory, and compare bytes in shared memory in the same way as mentioned above. Still problem with same address read.
Copy to shared memory looks like this:
myArray[idx] = global[someIndex-idx];
whatToSearch[idx] = global[someIndex+idx];
Rest of the code is the same. Only operations on data in example 2 are performed in shared arrays.
But first option is about 10% faster, than that with the shared memory, why?? Thank you for explanations.
If you are only using the data once and there is no data reuse between different threads in a block, then using shared memory will actually be slower. The reason is that when you copy data from global memory to shared, it still counts as a global transaction. Reads are faster when you read from shared memory, but it doesn't matter because you already had to read the memory once from global, and the second step of reading from shared memory is just an extra step that doesn't provide anything of value.
So, the key point is that using shared memory is only useful when you need to access the same data more than once (whether from the same thread, or from different threads in the same block).
You are using shared memory to save on accesses to global memory, but each thread is still making two accesses to global memory, so it won't be faster. The speed drop is probably because the threads that access the same location in global memory within a block try to read it into the same location in shared memory, and this needs to be serialized.
I'm not sure of exactly what you are doing from the code you posted, but you should ensure that the number of times global is read from and written to, aggregated across all the threads in a block, is significantly lower when you use shared memory. Otherwise you won't see a performance improvement.

C language: How to share a struct (or, if not possible, an array) between parent and child (forked) processes through IPC?

I googled this for the past two weeks and I didn't get any answer. This is what I have:
A parent process, which creates a struct myStruct that is basically a linked list using pointers (if this is a major issue, I can accept to use a fixed size array instead).
A fixed number of child processes created with fork() that need a read/write access to the struct (or array) created by the parent.
I don't know how to do in order to make the variable myStruct become shared between processes.
I tried to solve the problem using SysV IPC functions like shmget(), shmat(), etc... in order to allocate my variable in shared memory, but I don't know how to work with void memory pointers to read/write the values into myStruct.
Ideally, I would like to be able to use the dot notation (myStruct.node)->attribute = value in every process without having to deal with pointers, since I don't know how my struct is organized into memory.
Is that possible? Could some of you please help? Any help is REALLY appreciated.
Further note: I know using threads, pipes, sockets or things like that would be much easier, but this work is for academic purposes for which I have to simulate the presence of multiple independent processes.
If you create a shared anonymous mapping with:
p = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
Then these pages will not be copied-on-write, but rather will be shared by all forked processes.
Note that you have to be careful with locking here. You can use standard pthread mutexes and condvars on the shared memory segment between processes if (and only if!) you use pthread_mutexattr_setpshared and pthread_condattr_setpshared to mark them as shared between processes.
Note also that this technique maps a fixed size arena, and must be done before forking. How you manage the contents of the memory at p is up to you. It's nontrivial to create a resizable shared memory arena; if you want to go that route, I'd recommend posting a second question, as different approaches may be necessary.
The easiest way to share memory across a fork is to use a mmap()ed region of memory. If you use an anonymous mmap() (with MAP_ANON and MAP_SHARED) to store your struct (array, whatever) instead of malloc()ed memory, it will be shared by the child process.

Do threads have a distinct heap?

As far as I know each thread gets a distinct stack when the thread is created by the operating system. I wonder if each thread has a heap distinct to itself also?
No. All threads share a common heap.
Each thread has a private stack, which it can quickly add and remove items from. This makes stack based memory fast, but if you use too much stack memory, as occurs in infinite recursion, you will get a stack overflow.
Since all threads share the same heap, access to the allocator/deallocator must be synchronized. There are various methods and libraries for avoiding allocator contention.
Some languages allow you to create private pools of memory, or individual heaps, which you can assign to a single thread.
By default, C has only a single heap.
That said, some allocators that are thread aware will partition the heap so that each thread has it's own area to allocate from. The idea is that this should make the heap scale better.
One example of such a heap is Hoard.
Depends on the OS. The standard c runtime on windows and unices uses a shared heap across threads. This means locking every malloc/free.
On Symbian, for example, each thread comes with its own heap, although threads can share pointers to data allocated in any heap. Symbian's design is better in my opinion since it not only eliminates the need for locking during alloc/free, but also encourages clean specification of data ownership among threads. Also in that case when a thread dies, it takes all the objects it allocated along with it - i.e. it cannot leak objects that it has allocated, which is an important property to have in mobile devices with constrained memory.
Erlang also follows a similar design where a "process" acts as a unit of garbage collection. All data is communicated between processes by copying, except for binary blobs which are reference counted (I think).
Each thread has its own stack and call stack.
Each thread shares the same heap.
It depends on what exactly you mean when saying "heap".
All threads share the address space, so heap-allocated objects are accessible from all threads. Technically, stacks are shared as well in this sense, i.e. nothing prevents you from accessing other thread's stack (though it would almost never make any sense to do so).
On the other hand, there are heap structures used to allocate memory. That is where all the bookkeeping for heap memory allocation is done. These structures are sophisticatedly organized to minimize contention between the threads - so some threads might share a heap structure (an arena), and some might use distinct arenas.
See the following thread for an excellent explanation of the details: How does malloc work in a multithreaded environment?
Typically, threads share the heap and other resources, however there are thread-like constructions that don't. Among these thread-like constructions are Erlang's lightweight processes, and UNIX's full-on processes (created with a call to fork()). You might also be working on multi-machine concurrency, in which case your inter-thread communication options are considerably more limited.
Generally speaking, all threads use the same address space and therefore usually have just one heap.
However, it can be a bit more complicated. You might be looking for Thread Local Storage (TLS), but it stores single values only.
Windows-Specific:
TLS-space can be allocated using TlsAlloc and freed using TlsFree (Overview here). Again, it's not a heap, just DWORDs.
Strangely, Windows support multiple Heaps per process. One can store the Heap's handle in TLS. Then you would have something like a "Thread-Local Heap". However, just the handle is not known to the other threads, they still can access its memory using pointers as it's still the same address space.
EDIT: Some memory allocators (specifically jemalloc on FreeBSD) use TLS to assign "arenas" to threads. This is done to optimize allocation for multiple cores by reducing synchronization overhead.
On FreeRTOS Operating system, tasks(threads) share the same heap but each one of them has its own stack. This comes in very handy when dealing with low power low RAM architectures,because the same pool of memory can be accessed/shared by several threads, but this comes with a small catch , the developer needs to keep in mind that a mechanism for synchronizing malloc and free is needed, that is why it is necessary to use some type of process synchronization/lock when allocating or freeing memory on the heap, for example a semaphore or a mutex.

Resources