I am totally beginning in Linux kernel development. I am trying to learn about processes and scheduling (I know what I want to do is not "useful" it is just to learn).
I wrote a syscall which returns me the id of the thread/logical core on which my process is running.
But now, what I would like to do is: write a syscall which returns me the id of the physical core on which my process is running.
I tried to read the task_struct, but I did not find any clue.
I am a lost with all this code. I have no idea where I can start my research, and so on.
I am interested by your methodology. I'm on x86_64 and I'm using Linux 5.6.2.
What you want to do is basically the same thing that /proc/cpuinfo already does:
$ cat /proc/cpuinfo
processor : 0 <== you have this
...
core id : 0 <== you want to obtain this
...
We can therefore take a look at how /proc/cpuinfo does this, in arch/x86/kernel/cpu/proc.c. By analyzing the code a little bit, we see that the CPU information is obtained calling cpu_data():
static void *c_start(struct seq_file *m, loff_t *pos)
{
*pos = cpumask_next(*pos - 1, cpu_online_mask);
if ((*pos) < nr_cpu_ids)
return &cpu_data(*pos); // <=== HERE
return NULL;
}
The cpu_data() macro returns a struct cpuinfo_x86, which contains all the relevant information, and is then used here to print the core ID:
seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
Therefore, in a kernel module, you can do the following:
#include <linux/smp.h> // get_cpu(), put_cpu()
#include <asm/processor.h> // cpu_data(), struct cpuinfo_x86
// ...
static int __init modinit(void)
{
unsigned cpu;
struct cpuinfo_x86 *info;
cpu = get_cpu();
info = &cpu_data(cpu);
pr_info("CPU: %u, core: %d\n", cpu, info->cpu_core_id);
put_cpu(); // Don't forget this!
return 0;
}
Dmesg output on my machine after inserting/removing the module three times:
[14038.937774] cpuinfo: CPU: 1, core: 1
[14041.084951] cpuinfo: CPU: 5, core: 1
[14087.329053] cpuinfo: CPU: 6, core: 2
Which is consistent with the content of my /proc/cpuinfo:
$ cat /proc/cpuinfo | grep -e 'processor' -e 'core id'
processor : 0
core id : 0
processor : 1
core id : 1
processor : 2
core id : 2
processor : 3
core id : 3
processor : 4
core id : 0
processor : 5
core id : 1
processor : 6
core id : 2
processor : 7
core id : 3
Note that as Martin James rightfully pointed out, this information is not very useful since your process could be preempted and moved to a different core by the time your syscall finishes execution. If you want to avoid this you can use the sched_setaffinity() syscall in your userspace program to set the affinity to a single CPU.
Related
I have a question regarding PAPI (Performance Application Programming Interface). I downloaded and installed PAPI library. Still not sure how to use it correctly and what additional things I need, to make it work. I am trying to use it in C. I have this simple program:
int retval;
retval = PAPI_library_init(PAPI_VER_CURRENT);
if (retval != PAPI_VER_CURRENT && retval > 0) {
printf("PAPI error: 1\n");
exit(1);
}
if (retval < 0)
printf("PAPI error: 2\n");
retval = PAPI_is_initialized();
if (retval != PAPI_LOW_LEVEL_INITED)
printf("PAPI error: 2\n");
int num_hwcntrs = 0;
if ((num_hwcntrs = PAPI_num_counters()) <= PAPI_OK)
printf("This system has %d available counters. \n", num_hwcntrs);
I have included papi.h library and I am compiling with gcc -lpapi flag. I added library in path so it is able to compile and run, but as a result I get this:
This system has 0 available counters.
Thought initialization seems to work as it doesn't give error code.
Any advice or suggestion would be helpful to determine what I have not done right or missed to run it correctly. I mean, I should have available counters in my system, more precisely I need cache miss and cache hit counters.
I tried to count available counters after I run this another simple program and it gave error code -25:
int numEvents = 2;
long long values[2];
int events[2] = {PAPI_L3_TCA,PAPI_L3_TCM};
printf("PAPI error: %d\n", PAPI_start_counters(events, numEvents));
UPDATE: I just tried to check from terminal hardware information with command: papi_avail | more; and I got this:
Available PAPI preset and user defined events plus hardware information.
PAPI version : 5.7.0.0
Operating system : Linux 4.15.0-45-generic
Vendor string and code : GenuineIntel (1, 0x1)
Model string and code : Intel(R) Core(TM) i5-6200U CPU # 2.30GHz (78, 0x4e)
CPU revision : 3.000000
CPUID : Family/Model/Stepping 6/78/3, 0x06/0x4e/0x03
CPU Max MHz : 2800
CPU Min MHz : 400
Total cores : 4
SMT threads per core : 2
Cores per socket : 2
Sockets : 1
Cores per NUMA region : 4
NUMA regions : 1
Running in a VM : no
Number Hardware Counters : 0
Max Multiplex Counters : 384
Fast counter read (rdpmc): no
PAPI Preset Events
Name Code Avail Deriv Description (Note)
PAPI_L1_DCM 0x80000000 No No Level 1 data cache misses
PAPI_L1_ICM 0x80000001 No No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 No No Level 2 data cache misses
PAPI_L2_ICM 0x80000003 No No Level 2 instruction cache misses
.......
So because Number Hardware Counters is 0, I can't use this tool to count cache misses with PAPI's preset events? Is there any configuration that can be useful or should I forget about it till I change my laptop?
I want to generate x86 CPU to Memory traffic on Linux OS (Ubuntu 18) using gcc tool chain that should first fill up tlb (translation look aside buffer) and then cause tlb invalidates as it has already filled up. I have created simple code below but I am not sure if it can achieve the goal of filling up tlb and then invalidating it
#include<stdio.h>
int main()
{
int array[1000];
int i;
long sum = 0;
for(i=0; i < 1000;i++)
{
array[i] = i;
}
for(i=0; i < 1000;i++)
{
sum += array[i]
}
return 0;
}
Here is the processor specific info in case it is useful
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD EPYC 7281 16-Core Processor
stepping : 2
microcode : 0x8001227
cpu MHz : 2694.732
cache size : 512 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 16
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
The crux is to have answers to the following
what triggers tlb invalidates to happen?
How can one test tlb invalidates did happen?
I know how to get the number of logical cores in C.
sysconf(_SC_NPROCESSORS_CONF);
This will return 4 on my i3 processor. But actually there are only 2 cores in an i3.
How can I get physical core count?
This is a C solution using libcpuid.
cores.c:
#include <stdio.h>
#include <libcpuid.h>
int main(void)
{
struct cpu_raw_data_t raw;
struct cpu_id_t data;
cpuid_get_raw_data(&raw);
cpu_identify(&raw, &data);
printf("No. of Physical Core(s) : %d\n", data.num_cores);
return 0;
}
This is a C++ solution using Boost.
cores.cpp:
// use boost to get number of cores on the processor
// compile with : g++ -o cores cores.cpp -lboost_system -lboost_thread
#include <iostream>
#include <boost/thread.hpp>
int main ()
{
std::cout << "No. of Physical Core(s) : " << boost::thread::physical_concurrency() << std::endl;
std::cout << "No. of Logical Core(s) : " << boost::thread::hardware_concurrency() << std::endl;
return 0;
}
On my desktop (i5 2310) it returns:
No. of Physical Core(s) : 4
No. of Logical Core(s) : 4
While on my laptop (i5 480M):
No. of Physical Core(s) : 2
No. of Logical Core(s) : 4
Meaning that my laptop processor have Hyper-Threading tecnology
Without any lib:
int main()
{
unsigned int eax=11,ebx=0,ecx=1,edx=0;
asm volatile("cpuid"
: "=a" (eax),
"=b" (ebx),
"=c" (ecx),
"=d" (edx)
: "0" (eax), "2" (ecx)
: );
printf("Cores: %d\nThreads: %d\nActual thread: %d\n",eax,ebx,edx);
}
Output:
Cores: 4
Threads: 8
Actual thread: 1
You might simply read and parse /proc/cpuinfo pseudo-file (see proc(5) for details; open that pseudo-file as a text file and read it sequentially line by line; try cat /proc/cpuinfo in a terminal).
The advantage is that you just are parsing a (Linux-specific) text [pseudo-]file (without needing any external libraries, like in Gengisdave's answer), the disadvantage is that you need to parse it (not a big deal, read 80 bytes lines with fgets in a loop then use sscanf and test the scanned item count....)
The ht presence in flags: line means that your CPU has hyper-threading. The number of CPU threads is given by the number of processor: lines. The actual number of physical cores is given by cpu cores: (all this using a 4.1 kernel on my machine).
I am not sure you are right in wanting to understand how many physical cores you have. Hyper-threading may actually be useful. You need to benchmark.
And you probably should make the number of working threads (e.g. the size of your thread pool) in your application be user-configurable. Even on a 4 core hyper-threaded processor, I might want to have no more than 3 running threads (because I want to use the other threads for something else).
#include <stdio.h>
int main(int argc, char **argv)
{
unsigned int lcores = 0, tsibs = 0;
char buff[32];
char path[64];
for (lcores = 0;;lcores++) {
FILE *cpu;
snprintf(path, sizeof(path), "/sys/devices/system/cpu/cpu%u/topology/thread_siblings_list", lcores);
cpu = fopen(path, "r");
if (!cpu) break;
while (fscanf(cpu, "%[0-9]", buff)) {
tsibs++;
if (fgetc(cpu) != ',') break;
}
fclose(cpu);
}
printf("physical cores %u\n", lcores / (tsibs / lcores));
}
thread_siblings_list has a comma delimited list of cores which are "thread siblings" with the current core.
Divide the number of logical cores by the number of siblings to get the siblings per core. Divide the number of logical cores by the siblings per core to get the number of physical cores.
On linux, I'd like to know what "C" API to call to get the per-cpu stats.
I know about and could read /proc/loadavg from within my app, but this is the system-wide load avarages, not the per-cpu information. I want to tell the individual CPUs or cores apart.
As an example of an application that does this, When I run top and press "1", I can see the 4 or 8 processors/cores like this:
Cpu0 : 4.5%us, 0.0%sy, 0.0%ni, 95.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 42.2%us, 6.2%sy, 0.5%ni, 51.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 3.0%us, 1.5%sy, 0.0%ni, 94.5%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu3 : 7.0%us, 4.7%sy, 0.0%ni, 88.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
I've tried to strace top but this led to a rat's nest.
The file you want is /proc/stat. (You might want to refer to fs/proc/stat.c in the Linux kernel source.)
This is not a real answer but I would take a look at the source code of top.
I guess kernel file timer.c may be of some importance in this scenario to calculate load averages. From the file timer.c function calc_load()
unsigned long avenrun[3];
static inline void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
static int count = LOAD_FREQ;
count -= ticks;
if (count < 0) {
count += LOAD_FREQ;
active_tasks = count_active_tasks();
CALC_LOAD(avenrun[0], EXP_1, active_tasks);
CALC_LOAD(avenrun[1], EXP_5, active_tasks);
CALC_LOAD(avenrun[2], EXP_15, active_tasks);
}
}
I have some threaded code using PThreads on Linux that, I suspect, is suffering from excessive lock contention. What tools are available for me to measure this?
Solaris has DTrace and plockstat. Is there something similar on Linux? (I know about a recent DTrace port for Linux but it doesn't seem to be ready for prime time yet.)
mutrace is the tool:
http://0pointer.de/blog/projects/mutrace.html
Its easy to build, install and use.
After not having much luck with SystemTap, I decided to try and use the DTrace Linux port with some success, despite the lack of a plockstat provider. The following DTrace script is not quite a plockstat replacement but it managed to show me some of the information I was after.
#!/usr/sbin/dtrace -s
/* Usage: ./futex.d '"execname"' */
long total;
END
{
printf("total time spent on futex(): %ldms\n", total);
}
/* arg1 == 0 means FUTEX_WAIT */
syscall::futex:entry
/execname == $1 && arg1 == 0/
{
self->start = timestamp;
}
syscall::futex:return
/self->start/
{
this->elapsed = (timestamp - self->start) / 1000000;
#[execname] = quantize(this->elapsed);
total += this->elapsed;
self->start = 0;
}
Here's an example using the above DTrace script to measure time spent in FUTEX_WAIT for a simple test program from this DTrace article.
$ ./futex.d '"mutex-test"'
dtrace: script './futex.d' matched 3 probes
^C
CPU ID FUNCTION:NAME
1 2 :END total time spent on futex(): 11200ms
mutex-test
value ------------- Distribution ------------- count
128 | 0
256 |#################### 1
512 | 0
1024 | 0
2048 | 0
4096 | 0
8192 |#################### 1
16384 | 0
Definitely not great, but at least it's a starting point.
valgrind latest versions has a lock contention and lock validation tools:
http://valgrind.org/docs/manual/drd-manual.html
Which is great if you can produce the issue under Valgrind (it effects code run time speed) and have enough memory to run Valgrind.
For other uses, the more hard core Linux Trace Toolkit NG is recommended:
http://ltt.polymtl.ca/
Cheers,
Gilad
The latest version of systemtap comes with lots of example scripts. One in particular seems like it would server as a good starting point for helping you accomplish your task:
#! /usr/bin/env stap
global thread_thislock
global thread_blocktime
global FUTEX_WAIT = 0
global lock_waits
global process_names
probe syscall.futex {
if (op != FUTEX_WAIT) next
t = tid ()
process_names[pid()] = execname()
thread_thislock[t] = $uaddr
thread_blocktime[t] = gettimeofday_us()
}
probe syscall.futex.return {
t = tid()
ts = thread_blocktime[t]
if (ts) {
elapsed = gettimeofday_us() - ts
lock_waits[pid(), thread_thislock[t]] <<< elapsed
delete thread_blocktime[t]
delete thread_thislock[t]
}
}
probe end {
foreach ([pid+, lock] in lock_waits)
printf ("%s[%d] lock %p contended %d times, %d avg us\n",
process_names[pid], pid, lock, #count(lock_waits[pid,lock]),
#avg(lock_waits[pid,lock]))
}
I was attempting to diagnose something similar with a MySQL process previously and observed output similar to the following using the above script:
mysqld[3991] lock 0x000000000a1589e0 contended 45 times, 3 avg us
mysqld[3991] lock 0x000000004ad289d0 contended 1 times, 3 avg us
While the above script collects information on all processes running on the system, it would be quite easy to modify it to only work on a certain process or executable. For example, we could change the script to take a process ID argument and modify the probe on entering the futex call to look like:
probe begin {
process_id = strtol(#1, 10)
}
probe syscall.futex {
if (pid() == process_id && op == FUTEX_WAIT) {
t = tid ()
process_names[process_id] = execname()
thread_thislock[t] = $uaddr
thread_blocktime[t] = gettimeofday_us()
}
}
Obviously, you could modify the script lots of ways to suit what you want to do. I'd encourage you to have a look at the various example scripts for SystemTap. They are probably the best starting point.
In the absence of DTrace, your best bet is probably SystemTap. Here's a positive write up.
http://davidcarterca.wordpress.com/2009/05/27/systemtap/