pthread POSIX C library detachstate - c

I was asked from where do we know that when passing NULL as a second argument in pthread_create() function the thread is made joinable.
I mean, I know that man pages state so, but a justification in code was demanded.
I know that when NULL is passed in, default attributes are used:
const struct pthread_attr *iattr = (struct pthread_attr *) attr;
if (iattr == NULL)
/* Is this the best idea? On NUMA machines this could mean accessing far-away memory. */
iattr = &default_attr;
I know that it should be somewhere in the code of pthread library, but I don't know where exactly.
I know that the definition of default_attr is in pthread_create.c:
static const struct pthread_attr default_attr = { /* Just some value > 0 which gets rounded to the nearest page size. */ .guardsize = 1, };
http://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_create.c;h=4fe0755079e5491ad360c3b4f26c182543a0bd6e;hb=HEAD#l457
but I do not know where is exactly stated in the code that this result in a joinable thread.
Thanks in advance.

First off, from the code you pasted you can see that default_attr contains zeroes in almost all fields (there's no such thing as a half-initialized variable in C: if you only initialize some fields, the others are set to 0).
Second, pthread_create contains this code:
/* Initialize the field for the ID of the thread which is waiting
for us. This is a self-reference in case the thread is created
detached. */
pd->joinid = iattr->flags & ATTR_FLAG_DETACHSTATE ? pd : NULL;
This line checks whether iattr->flags has the ATTR_FLAG_DETACHSTATE bit set, which (for default_attr) it doesn't because default_attr.flags is 0. Thus it sets pd->joinid to NULL and not to pd as for detached threads.
(Note that this answer only applies to GNU glibc and not to POSIX pthreads in general.)

Related

Synchronize with sigev_notify_function()

I would like to read (asynchronously) BLOCK_SIZE bytes of one file, and the BLOCK_SIZE bytes of the second file, printing what has been read to the buffer as soon as the respective buffer has been filled. Let me illustrate what I mean:
// in main()
int infile_fd = open(infile_name, O_RDONLY); // add error checking
int maskfile_fd = open(maskfile_name, O_RDONLY); // add error checking
char* buffer_infile = malloc(BLOCK_SIZE); // add error checking
char* buffer_maskfile = malloc(BLOCK_SIZE); // add error checking
struct aiocb cb_infile;
struct aiocb cb_maskfile;
// set AIO control blocks
memset(&cb_infile, 0, sizeof(struct aiocb));
cb_infile.aio_fildes = infile_fd;
cb_infile.aio_buf = buffer_infile;
cb_infile.aio_nbytes = BLOCK_SIZE;
cb_infile.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_infile.aio_sigevent.sigev_notify_function = print_buffer;
cb_infile.aio_sigevent.sigev_value.sival_ptr = buffer_infile;
memset(&cb_maskfile, 0, sizeof(struct aiocb));
cb_maskfile.aio_fildes = maskfile_fd;
cb_maskfile.aio_buf = buffer_maskfile;
cb_maskfile.aio_nbytes = BLOCK_SIZE;
cb_maskfile.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_maskfile.aio_sigevent.sigev_notify_function = print_buffer;
cb_maskfile.aio_sigevent.sigev_value.sival_ptr = buffer_maskfile;
and the print_buffer() function is defined as follows:
void print_buffer(union sigval sv)
{
printf("%s\n", __func__);
printf("buffer address: %p\n", sv.sival_ptr);
printf("buffer: %.128s\n", (char*)sv.sival_ptr);
}
By the end of the program I do the usual clean up, i.e.
// clean up
close(infile_fd); // add error checking
close(maskfile_fd); // add error checking
free(buffer_infile);
printf("buffer_inline freed\n");
free(buffer_maskfile);
printf("buffer_maskfile freed\n");
The problem is, every once in a while buffer_inline gets freed before print_buffer manages to print its contents to the console. In a usual case I would employ some kind of pthread_join() but as far as I know this is impossible since POSIX does not specify that sigev_notify_function must be implemented using threads, and besides, how would I get the TID of such thread to call pthread_join() on?
Don't do it this way, if you can avoid it. If you can, just let process termination take care of it all.
Otherwise, the answer indicated in Andrew Henle's comment above is right on. You need to be sure that no more sigev_notify_functions will improperly reference the buffers.
The easiest way to do this is simply to countdown the number of expected notifications before freeing the buffers.
Note: your SIGEV_THREAD function is executed in a separate thread, though not necessarily a new thread each time. (POSIX.1-2017 System Interfaces ยง2.4.2) Importantly, you are not meant to manage this thread's lifecycle: it is detached by default, with PTHREAD_CREATE_JOINABLE explicitly noted as undefined behavior.
As an aside, I'd suggest never using SIGEV_THREAD in robust code. Per spec, the signal mask of the sigev_notify_function thread is implementation-defined. Yikes. For me, that makes it per se unreliable. In my view, SIGEV_SIGNAL and a dedicated signal-handling thread are much safer.

Copying char* values through Queues and threads on MBED OS

I am trying to implement some RTOS threads on Arm MBED OS over a K64F board. I am parting from the RTOS examples and I have succesfully run and communicated different threads using Queues. I am having problems when copying char* values from one struct to another to get a message from one queue to another. I believe I am misunderstanding something and that my problem is related to pointers and memory handling but I am not able to get through it.
I have defined diferent queues to send data to various threads. I have also created a basic data structure containing everything I need to go among these threads. In this struct I have a char* variable (rHostAddr) containing the remote host address that requested a service.
MemoryPool<cMsg, 16> AMPool;
Queue<cMsg, 16> AMQueue;
MemoryPool<cMsg, 16> ioLedPool;
Queue<cMsg, 16> ioLedQueue;
typedef struct{
...
char* rHostAddr;
...
} cMsg;
In the Main Thread I am creating this data structure and putting it in the first queue (AMQueue).
--- Main Thread ---
cMsg *message = AMPool.alloc();
char* rcvaddrs = "111.111.111.111";
message->rHostAddr = "111.111.111.111";
rcvaddrs = (char*)addr.get_ip_address();
message->rHostAddr = rcvaddrs;
AMQueue.put(message);
On the Thread 1 I wait for a message to arrive and on certain conditions I copy the whole structure to a new one created from the corresponding pool and insert it on a new queue (ioLedQueue).
--- Thread 1 ---
cMsg *msg;
cMsg *ledm = ioLedPool.alloc();
osEvent evt = AMQueue.get();
msg = (cMsg*)evt.value.p;
msg.rHostAddr = ledm.rHostAddr;
printf("\t -- Host 1 -- %s\n\r", ledm->rHostAddr);
ioLedQueue.put(ledm);
On the Thread 2 I get the message structure and the data .
--- Thread 2 ---
cMsg *msg;
osEvent evt = ioLedQueue.get();
msg = (cMsg*)evt.value.p;
printf("\t -- Host 2 -- %s\n\r", msg->rHostAddr);
On this stage rHostAddr is empty. I can see the value on the printf "Host 1" but not in the "Host 2"
I believe (if I am not wrong) that the problem comes from assigning with = operand, as I am copying the address, not the value, and it is lost when first pool memory is freed. I have tried copying the value with memcpy, strcpy and even my own char by char but system hangs when calling this methods.
How can I copy the value through this queues?
I move it here as the correct answer was written as a comment. Converting the value to a array of chars was the way to go, so the string data is part of the struct.
char rHostAddr[40];
Now the assignation can be done with srtcpy method and it is passed through all the process correctly:
char* rcvaddrs = (char*)addr.get_ip_address();
strcpy(message->rHostAddr,rcvaddrs);
Take a look at this solution from ARM mbed:
https://github.com/ARMmbed/mbed-events

Workqueue implementation in Linux Kernel

Can any one help me to understand difference between below mentioned APIs in Linux kernel:
struct workqueue_struct *create_workqueue(const char *name);
struct workqueue_struct *create_singlethread_workqueue(const char *name);
I had written sample modules, when I try to see them using ps -aef, both have created a workqueue, but I was not able to see any difference.
I have referred to http://www.makelinux.net/ldd3/chp-7-sect-6, and according to LDD3:
If you use create_workqueue, you get a workqueue that has a dedicated thread for each processor on the system. In many cases, all those threads are simply overkill; if a single worker thread will suffice, create the workqueue with create_singlethread_workqueue instead.
But I was not able to see multiple worker threads (each for a processor).
Workqueues have changed since LDD3 was written.
These two functions are actually macros:
#define create_workqueue(name) \
alloc_workqueue("%s", WQ_MEM_RECLAIM, 1, (name))
#define create_singlethread_workqueue(name) \
alloc_workqueue("%s", WQ_UNBOUND | WQ_MEM_RECLAIM, 1, (name))
The alloc_workqueue documentation says:
Allocate a workqueue with the specified parameters. For detailed
information on WQ_* flags, please refer to Documentation/workqueue.txt.
That file is too big to quote entirely, but it says:
alloc_workqueue() allocates a wq. The original create_*workqueue()
functions are deprecated and scheduled for removal.
[...]
A wq no longer manages execution resources but serves as a domain for
forward progress guarantee, flush and work item attributes.
if(singlethread){
cwq = init_cpu_workqueue(wq, singlethread_cpu);
err = create_workqueue_thread(cwq, singlethread_cpu);
start_workqueue_thread(cwq, -1);
}else{
list_add(&wq->list, &workqueues);
for_each_possible_cpu(cpu) {
cwq = init_cpu_workqueue(wq, cpu);
err = create_workqueue_thread(cwq, cpu);
start_workqueue_thread(cwq, cpu);
}
}

Move memory pages per-thread in NUMA architecture

i have 2 questions in one:
(i) Suppose thread X is running at CPU Y. Is it possible to use the syscalls migrate_pages - or even better move_pages (or their libnuma wrapper) - to move the pages associated with X to the node in which Y is connected?
This question arrises because first argument of both syscalls is PID (and i need a per-thread approach for some researching i'm doing)
(ii) in the case of positive answer for (i), how can i get all the pages used by some thread? My aim is, move the page(s) that contains array M[] for exemple...how to "link" data structures with their memory pages, for the sake of using the syscalls above?
An extra information: i'm using C with pthreads. Thanks in advance !
You want to use the higher level libnuma interfaces instead of the low level system calls.
The libnuma library offers a simple programming interface to the NUMA (Non Uniform Memory Access) policy supported by the Linux kernel. On a NUMA architecture some memory areas have different latency or bandwidth than others.
Available policies are page interleaving (i.e., allocate in a round-robin fashion from all, or a subset, of the nodes on the system), preferred node allocation (i.e., preferably allocate on a particular node), local allocation (i.e., allocate on the node on which the task is currently executing), or allocation only on specific nodes (i.e., allocate on some subset of the available nodes). It is also possible to bind tasks to specific nodes.
The man pages for the low level numa_* system calls warn you away from using them:
Link with -lnuma to get the system call definitions. libnuma and the required <numaif.h> header are available in the numactl package.
However, applications should not use these system calls directly. Instead, the higher level interface provided by the numa(3) functions in the numactl package is recommended. The numactl package is available at <ftp://oss.sgi.com/www/projects/libnuma/download/>. The package is also included in some Linux distributions. Some distributions include the development library and header in the separate numactl-devel package.
Here's the code I use for pinning a thread to a single CPU and moving the stack to the corresponding NUMA node (slightly adapted to remove some constants defined elsewhere). Note that I first create the thread normally, and then call the SetAffinityAndRelocateStack() below from within the thread. I think this is much better then trying to create your own stack, since stacks have special support for growing in case the bottom is reached.
The code can also be adapted to operate on the newly created thread from outside, but this could give rise to race conditions (e.g. if the thread performs I/O into its stack), so I wouldn't recommend it.
void* PreFaultStack()
{
const size_t NUM_PAGES_TO_PRE_FAULT = 50;
const size_t size = NUM_PAGES_TO_PRE_FAULT * numa_pagesize();
void *allocaBase = alloca(size);
memset(allocaBase, 0, size);
return allocaBase;
}
void SetAffinityAndRelocateStack(int cpuNum)
{
assert(-1 != cpuNum);
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(cpuNum, &cpuset);
const int rc = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
assert(0 == rc);
pthread_attr_t attr;
void *stackAddr = nullptr;
size_t stackSize = 0;
if ((0 != pthread_getattr_np(pthread_self(), &attr)) || (0 != pthread_attr_getstack(&attr, &stackAddr, &stackSize))) {
assert(false);
}
const unsigned long nodeMask = 1UL << numa_node_of_cpu(cpuNum);
const auto bindRc = mbind(stackAddr, stackSize, MPOL_BIND, &nodeMask, sizeof(nodeMask), MPOL_MF_MOVE | MPOL_MF_STRICT);
assert(0 == bindRc);
PreFaultStack();
// TODO: Also lock the stack with mlock() to guarantee it stays resident in RAM
return;
}

Can I get a thread's stack address from pthread_self()

I want to get the stack address of a thread through some function to which we can pass pthread_self(). Is it possible? The reason I am doing this is because I want to write my own assigned thread identifier for a thread somewhere in its stack. I can write near the end of the stack (end of the stack memory and not the current stack address. We can ofcourse expect the application to not get to the bottom of the stack and therefore use space from there).
In other words, I want to use the thread stack for putting a kind of thread local variable there. So, do we have some function like the following provided by pthread?
stack_address = stack_address_for_thread( pthread_self() );
I can use the syntax for thread local variables by gcc for this purpose, but I'm in a situation where I can't use them.
Probably it's better to use pthread_key_create and pthread_key_getspecific and let the implementation worry about those details.
A good example of usage is here:
pthread_key_create
Edit: I should clarify -- I'm suggesting you use the libpthread provided method of creating thread-local information, instead of rolling your own by pushing something onto the end of the stack where it's possible your information could be lost.
With GCC, it is simpler to declare your thread local variables with __thread keyword, like
__thread int i;
extern __thread struct state s;
static __thread char *p;
That is GCC specific (but I'll guess clang has it also, and the newest C++ & future C standards have something similar), but less brittle than pointers hacks based upeon pthread_self() (and should be a bit faster, but less portable, than pthread_key_getsspecific, as suggested by Denniston)
But I would really like you to give more context and motivation in your questions.
I want to write my own assigned thread identifier for a thread
There are multiple ways to achieve that. The most obvious one:
__thread int my_id;
I can use the syntax for thread local variables by gcc for this purpose, but I'm in a situation where I can't use them.
You need to explain why you can't use thread-locals. Chances are high that other solutions, such as pthread_getattr_np, wouldn't work either.
First get the bottom of the stack and give read/write permission to it with the following code.
pthread_attr_t attr;
void * stackaddr;
int * plocal_var;
size_t stacksize;
pthread_getattr_np(pthread_self(), &attr);
pthread_attr_getstack( &attr, &stackaddr, &stacksize );
printf( "stackaddr = %p, stacksize = %d\n", stackaddr, stacksize );
plocal_var = (int*)mmap( stackaddr, 4096, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0 );
// Now try to write something
*plocal_var = 4;
and then you can get the thread ID, with the function get_thread_id() shown below. Note that calling mmap with size 4096 has the effect of pushing the boundary of the stack by 4096, that is why we subtract 4096 when getting the local variable address.
int get_thread_id()
{
pthread_attr_t attr;
char * stackaddr;
int * plocal_var;
size_t stacksize;
pthread_getattr_np(pthread_self(), &attr);
pthread_attr_getstack( &attr, (void**)&stackaddr, &stacksize );
//printf( "stackaddr = %p, stacksize = %d\n", stackaddr, stacksize );
plocal_var = (int*)(stackaddr - 4096);
return *plocal_var;
}

Resources