To get perf statistics of parallel running threads -
To get list of threads I use thread list in /proc/self/task
Now I want to get ID of a thread's CPU time clock. But clock_getcpuclockid only works with PIDs.
pthread_getcpuclockid requires the pthread id of thread and I did not find any way to get pthread id from TId of thread so I am looking for any alternative solution to this problem. CLOCK_THREAD_CPUTIME_ID will return information of current only and I need information of all parallel threads. Any suggestions are welcome.
Is there any alternative to pthread_getcpuclockid? I wonder what pthread implementation does inside ?
Read time(7) & clock_gettime(2). You probably want to use CLOCK_THREAD_CPUTIME_ID
See also proc(5)
Is there any alternative to pthread_getcpuclockid? I wonder what pthread implementation does inside ?
This is simple, and implementation specific (probably varies between GNU libc & musl-libc). Most (AFAIK, all) C standard libraries on Linux are free software, so you can study their source code.
For musl-libc src/threads/pthread_getcpuclockid.c is fetching the clock id from the data related to the thread_t
For GNU libc, I leave the diving into its source code to you.
Related
Where can I find documentation for "adaptive" pthread mutexes? The symbol PTHREAD_MUTEX_ADAPTIVE_NP is defined on my system, but the only documentation I can find online says nothing about what an adaptive mutex is, or when it's appropriate to use.
So... what is it, and when should I use it?
For reference, my version of libc is:
GNU C Library (Ubuntu EGLIBC 2.15-0ubuntu10.5) stable release version 2.15, by Roland McGrath et al.
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.6.3.
Compiled on a Linux 3.2.50 system on 2013-09-30.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
and "uname -a" gives
Linux desktop 3.2.0-55-generic #85-Ubuntu SMP Wed Oct 2 12:29:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
PTHREAD_MUTEX_ADAPTIVE_NP is something that I invented while working in the role of a glibc contributor on making LinuxThreads more reliable and perform better. LinuxThreads was the predecessor to glibc's NPTL library, originally developed as a stand-alone library by Xavier Leroy, who is also well-known as one of the creators of OCaml.
The adaptive mutex survived into NTPL in essentially unmodified form: the code is nearly identical, including the magic constants for the estimator smoothing and the maximum spin relative to the estimator.
Under SMP, when you go to acquire a mutex and see that it is locked, it can be sub-optimal to simply give up and call into the kernel to block. If the owner of the lock only holds the lock for a few instructions, it is cheaper to just wait for the execution of those instructions, and then acquire the lock with an atomic operation, instead of spending hundreds of extra cycles by making a system call.
The kernel developers know this very well, which is one reason why we have spinlocks in the Linux kernel for fast critical sections. (Among the other reasons is, of course, that code which cannot sleep, because it is in an interrupt context, can acquire spinlocks.)
The question is, how long should you wait? If you spin forever until the lock is acquired, that can be sub-optimal. User space programs are not well-written like kernel code (cough). They could have long critical sections. They also cannot disable pre-emption; sometimes critical sections blow up due to a context switch. (POSIX threads now provide real time tools to deal with this: you can put threads into a real-time priority and FIFO scheduling and such, plus configure processor affinity.)
I think we experimented with fixed iteration counts, but then I had this idea: why should we guess, when we can measure. Why don't we implement a smoothed estimator of the lock duration, similarly to what we do for the TCP retransmission time-out (RTO) estimator. Each time we spin on a lock, we should measure how many spins it actually took to acquire it. Moreover, we should not spin forever: we should perhaps spin only at most twice the current estimator value. When we take a measurement, we can smooth it exponentially, in just a few instructions: take a fraction of the previous value, and of the new value, and add them together, which is the same as adding a fraction of their difference to back to the estimator: say, estimator += (new_val - estimator)/8 for a 1/8 to 7/8 blend between the old and new value.
You can think of this as a watchdog. Suppose that the estimator tells you that the lock, on average, takes 80 spins to acquire. You can be quite confident, then, that if you have executed 160 spins, then something is wrong: the owner of the lock is executing some exceptionally long case, or maybe has hit a page fault or was otherwise preempted. At this point the waiting thread cuts its losses and calls into the kernel to block.
Without measurement, you cannot do this accurately: there is no "one size fits all" value. Say, a fixed limit of 200 spins would be sub-optimal in a program whose critical sections are so short that a lock can almost always be fetched after waiting only 10 spins. The mutex locking function would burn through 200 iterations every time there is an anomalous wait time, instead of nicely giving up at, say, 20 and saving cycles.
This adaptive approach is specialized, in the sense that it will not work for all locks in all programs, so it is packaged as a special mutex type. For instance, it will not work very well for programs that lock mutexes for long periods: periods so long that more CPU time is wasted spinning on the large estimator values than would have been by going into the kernel. The approach is also not suitable for uniprocessors: all threads besides the one which is trying to get the lock are suspended in the kernel. The approach is also not suitable in situations in which fairness is important: it is an opportunistic lock. No matter how many other threads have been waiting, for no matter how long, or what their priority is, a new thread can come along and snatch the lock.
If you have very well-behaved code with short critical sections that are highly contended, and you're looking for better performance on SMP, the adaptive mutex may be worth a try.
The symbol is mentionned there:
http://elias.rhi.hi.is/libc/Mutexes.html
"LinuxThreads supports only one mutex attribute: the mutex type, which is either PTHREAD_MUTEX_ADAPTIVE_NP for "fast" mutexes, PTHREAD_MUTEX_RECURSIVE_NP for "recursive" mutexes, PTHREAD_MUTEX_TIMED_NP for "timed" mutexes, or PTHREAD_MUTEX_ERRORCHECK_NP for "error checking" mutexes. As the NP suffix indicates, this is a non-portable extension to the POSIX standard and should not be employed in portable programs.
The mutex type determines what happens if a thread attempts to lock a mutex it already owns with pthread_mutex_lock. If the mutex is of the "fast" type, pthread_mutex_lock simply suspends the calling thread forever. If the mutex is of the "error checking" type, pthread_mutex_lock returns immediately with the error code EDEADLK. If the mutex is of the "recursive" type, the call to pthread_mutex_lock returns immediately with a success return code. The number of times the thread owning the mutex has locked it is recorded in the mutex. The owning thread must call pthread_mutex_unlock the same number of times before the mutex returns to the unlocked state.
The default mutex type is "timed", that is, PTHREAD_MUTEX_TIMED_NP."
EDIT: updated with info found by jthill (thanks!)
A little more info on the mutex flags and the PTHREAD_MUTEX_ADAPTIVE_NP can be found here:
"The PTHRED_MUTEX_ADAPTIVE_NP is a new mutex that is intended for high
throughput at the sacrifice of fairness and even CPU cycles. This
mutex does not transfer ownership to a waiting thread, but rather
allows for competition. Also, over an SMP kernel, the lock operation
uses spinning to retry the lock to avoid the cost of immediate
descheduling."
Which basically suggest the following: in case where high thoughput is desirable, such mutex can be implemented requiring extra considerations from the thread logic due to it's very nature. You will have to design an algorithm that can use these properties resulting in high throughput. Something that load balances itself from within (as opposed to "from the kernel") where order of execution is unimportant.
There was a very good book for linux/unix multithreading programming which name escapes me. If I find it I'll update.
Here you go. As I read it, it's a brutally simple mutex that doesn't care about anything except making the no-contention case run fast.
as most C programmers know libc gives a non portable functions for thread cpu affinity tuning (pthread_attr_setaffinity_np()). However, what I do not really know is how can this be done when implementing a kernel module. Any answer that mentions or redirects to some real examples would be rather helpful.
You should use kthreads, which stands for kernel threads. To create such on specified cpu, you should invoke kthread_create_on_cpu(). It is defined in include/linux/kthread.h. Thread will be created in sleep state, so you should call wake_up_process() on it. That's all.
You can get one example of using kthreads in my answer in this question.
You can use kthread_bind() function.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
I want to implement the multiple threading in C without using any of the POSIX library.
Any help would be appreciated.
Not : Don't use fork() or vfork().
A thread in Linux is essentially a process that shares memory and resources with its parent. The Linux Kernel does not distinguish between a process and a thread, in other words, there's no concept of a light weight process in Linux like in some other operating systems. Threads in Linux are implemented as standard processes, so it's possible to create a thread using just clone() which is normally called by fork() in the following way:
clone(SIGCHLD, 0);
This clones the signal handlers only, however, with the appropriate flags you can create a thread:
clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0);
This is identical to the previous call, except that the address space, filesystem resources, file descriptors and signal handlers are shared by the two processes.
A different approach is to use user-level threads (also called fibres) those are threads of execution implemented at the user-level, which means the OS is unaware of those threads and the scheduling or context switching has to be done at the user-level. Most user-level schedulers are implemented as cooperative schedulers, but it is also possible to implement a preemptive scheduler with a simple round robin scheduling.
Check the clone(2) man page for details and if you want more information I recommend the Linux Kernel Development 3rd edition By Robert Love, (not affiliated with the author in any way) there's a look inside link there you could read some of it online. As for the user-level threads, there's a minimal package written by me, called libutask, that implements both a cooperative and a preemptive scheduler, you can check the source code if you like.
Note1: I have not mentioned UNIX, as far as I know, this is Linux-specific implementation.
Note2: Creating your own threads with clone is not a real world solution, read the comments for the some problems you may have to deal with, it's only an answer to the question is it possible to create threads without using pthreads, in this case the answer is yes.
See:
setjmp,
longjmp,
sigaltstack,
sigaltstack
for UNIX like systems.
Also see:
getcontext and setcontext,
and more generally ucontext
for BSDs and modern UNIXes.
This page gives many examples of barebone implementations using these primitives and more.
You can use atomic instructions to implement the locking primitives (mutex, semaphores).
I also suggest looking at actual implementations of userland thread libraries to get some hints. See this page which gives a list of implementations for Linux.
Finally, you might want to get some information on coroutines and perhaps trampolines, although the later isn't as closely related.
You can also check out the new <threads.h> header from the C standard library. (C11)
It's got what you need like int thrd_create(thrd_t *thr, thrd_start_t func, void *arg); as well as mutex functions and condition variables.
One can most certainly make at least a co-operative microkernel with plain c on top of pretty much any operating system. Fundamentally it only requires cloning of the stack frame (and adjusting a few pointers accordingly -- especially the return address from a function to the other threads current return address). And a few utility functions, such as "context switch" the stack to heap and back.
If a timer interrupt with a callback is allowed, one can do a pre-emptive microkernel.
At least Dr Dobbs and IOCCC have presented options along these lines.
I wanted to know how to implement my own threading library.
What I have is a CPU (PowerPC architecture) and the C Standard Library.
Is there an open source light-weight implementation I can look at?
At its very simplest a thread will need:
Some memory for stack space
Somewhere to store its context (ie. register contents, program counter, stack pointer, etc.)
On top of that you will need to implement a simple "kernel" that will be responsible for the thread switching. And if you're trying to implement pre-emptive threading then you'll also need a periodic source of interrupts. eg. a timer. In this case you can execute your thread switching code in the timer interrupt.
Take a look at the setjmp()/longjmp() routines, and the corresponding jmp_buf structure. This will give you easy access to the stack pointer so that you can assign your own stack space, and will give you a simple way of capturing all of the register contents to provide your thread's context.
Typically the longjmp() function is a wrapper for a return from interrupt instruction, which fits very nicely with having thread scheduling functionality in the timer interrupt. You will need to check the implementation of longjmp() and jmp_buf for your platform though.
Try looking for thread implementations on smaller microprocessors, which typically don't have OS's. eg. Atmel AVR, or Microchip PIC.
For example : discussion on AVRFreaks
For a decent thread library you need:
atomic operations to avoid races (to implement e.g a mutex)
some OS support to do the scheduling and to avoid busy waiting
some OS support to implement context switching
All three leave the scope of what C99 offers you. Atomic operations are introduced in C11, up to now C11 implementations don't seem to be ready, so these are usually implemented in assembler. For the later two, you'd have to rely on your OS.
Maybe you could look at C++ which has threading support. I'd start by picking some of their most useful primitives (for example futures), see how they work, and do a simple implementation.
Is the Linux kernel aware of pthreads in the user address space ( which i dont think it is..but i did not find any info abt that). How does the Instruction pointer change when thread switching takes place.. ??
The native NPTL (native posix thread library) used in Linux maps pthreads to "processes that share resources and therefore look like threads" in the kernel. In this way, the kernel's scheduler directly controls the scheduling of pthreads.
A "pthread switch" is done by the exact same code (in the kernel) that handles process switches. Simplified, this would be something like "store previous process state; if the next process uses a different virtual address space then switch virtual address spaces; load next process state;" (where "process state" includes the instruction pointer for the process/thread).
Well the Linux kernel doesn't know about user threads (pthread does in userspace, moreover the kernel doesn't really care about them except it just needs to know what to schedule).
The instruction pointer is changed in the kernel during what's called a context switch. During this switch the kernel essentially asks the scheduler what's next? the scheduler will hand it a task_struct which contains all the information about the thread and the interrupt handler for a context switch will go ahead and set the values on the CPU accordingly (page tables, instruction pointer, etc...) and when that code is done the CPU simply just starts executing from there.
1) The kernel doesn't know about user-level threads. However, NPTL isn't user level
2) This is a really broad question. You should look at an OS book. It will go into depth on that issue and all other involved in a context switch.