How CPU allocation is done in Linux ? Thread level or Process level? [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I am trying to understand how CPU is distributed among different processes with different no of threads.I have two programs Program1 and Program2.
Program1 has 5 threads whereas Program2 has ONLY main thread.
SCENARIO -1 :
terminal-1 : ./Program1
terminal-2 : ./Program2
When I run Program1 in one terminal and Program2 in another terminal , the CPU allocation is done 50% for Program1 and 50% for Program2. Each thread of Program1 is getting 10% (cumulatively 50% for Program1)
This shows, no matter the no of threads a process have, every process will get equal share of CPU. This shows CPU allocation is done at Process level.
pstree shows
├─bash───P1───5*[{P1}]
├─bash───P2───{P2}
SCENARIO -2 :
terminal-1 : ./Program1 & ./Program2
When I run both Program1 and Program2 in SAME terminal , the CPU allocation is done equal for Program1 and all threads of Program2. It means each thread of Program1 is getting almost 17% (cumulatively Program1 is getting 83%) and Program2 is also getting 17%. This shows CPU allocation is done at Thread level.
pstree shows
├─bash─┬─P1───5*[{P1}]
│ └─P2
I am using Ubuntu 12.04.4 LTS, kernel - config-3.11.0-15-generic. I have also used Ubuntu 14.04.4 , kernel-3.16.x and got similar results.
Can anyone explain how CPU scheduler of LINUX KERNEL distinguishing SCENARIO-1 and SCENARIO-2?
I think the CPU scheduler is distinguishing both SCENARIO in somewhere before allocating CPU.
To understand how CPU scheduler is distinguishing SCENARIO-1 and SCENARIO-2 ,I have downloaded Linux kernel source code.
However, I haven't found in source code where it is distinguishing SCENARIO-1 and SCENARIO-2.
It will be great if anyone point me the source code or function where the CPU scheduler is distinguishing SCENARIO-1 and SCENARIO-2.
Thanks in advance.
NOTE : Although Ubuntu is based on Debian, surprisingly, in Debian 8 (kernel-3.16.0-4-686-pae) in both SCENARIO's CPU allocation is done at Thread level means each thread of Program1 is getting almost 17% (cumulatively Program1 is getting 83%) and Program2 is also getting 17%.
Here is the code :
Program1(with 5 threads)
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
// Let us create a global variable to change it in threads
int g = 0;
// The function to be executed by all threads
void *myThreadFun(void *vargp)
{
// Store the value argument passed to this thread
int myid = (int)vargp;
// Let us create a static variable to observe its changes
static int s = 0;
// Change static and global variables
++s; ++g;
// Print the argument, static and global variables
printf("Thread ID: %d, Static: %d, Global: %d\n", myid, ++s, ++g);
while(1); // Representing CPU Bound Work
}
int main()
{
int i;
pthread_t tid[5];
// Let us create three threads
for (i = 0; i < 5; i++)
pthread_create(&tid[i], NULL, myThreadFun, (void *)i);
for (i = 0; i < 5; i++)
pthread_join(tid[i],NULL);
return 0;
}
Program2 ( with only main thread)
#include <stdio.h>
#include <stdlib.h>
int main()
{
while(1);// Representing CPU Bound Work
}
To disable all optimization by gcc, I have used O0 option while compiling the both programs.
gcc -O0 program1.c -o p1 -lpthread
gcc -O0 program2.c -o p2
UPDATE: As per explanation of ninjalj , in Scenario-1, CPU allocation is done at control groups level and as I am using two different terminal(means two different session), thus 2 different control groups and each control groups is getting 50% CPU allocation. This is due to reason, autogroup is enabled by default.
As Program2 has ONLY one thread and Program1 has more threads, I want to run
both the program in separate terminal(different session) and get more CPU allocation for Program1 ( as in Scenario-2, Program1 is getting 83% CPU allocation compared to 17% of Program2). Is it possible in any way,that CPU allocation of Scenario1 will be same as Scenario-2 in Ubuntu?
Also it is surprising to me although Ubuntu is based on Debian, still Debian and Ubuntu is behaving differently. In case of Debian, Program1 is getting more CPU in both Scenario.

The linux kernel does not distinguish processes vs. threads in scheduling.
Threads are processes that just happen to share most of their memory. Beyond that, they are treated equally by the scheduler.
You can have 50 processes and 30 threads. That's 80 "things" and the kernel will schedule them without regard to whether they are processes or threads

Related

C OpenMP : libgomp: Thread creation failed: Resource temporarily unavailable

I was trying to do a p basic project and It seems like I have run out of threads? Do you guys know how do can I fix the problem?
Here is the code:
int main()
{
omp_set_num_threads(2150);
#pragma omp parallel
{
printf("%d\n",omp_get_thread_num());
}
return 0;
}
and here is the global compiler setting I have written on "other compiler options" on CodeBlocks:
-fopenmp
I am getting the error of:
libgomp: Thread creation failed: Resource temporarily unavailable
I have seen similar threads on the site, but I have not got the answer or solution as of yet.
Specs:
Intel i5 6400
2x8GB ram
Windows 10 64 bit
The problem is
omp_set_num_threads(2150);
The OS will impose a limit on the number of threads a process can create. This may be indirect for example by limiting the stack size. Creating 2150 exceeds the limits.
You mention that you've got the intel 15 6400 which is a quad core chip. Try setting the number of threads to something more reasonable. In your case:
omp_set_num_threads(4);
For numerical processing, performance will likely suffer when using more than 4 threads on a 4 core system

How to control/restrict other process to run for very small amount of time in linux

The solution in the above link blocking-and-resuming-execution-of-an-independent-process-in-linux says, I can use ptrace to achieve my goal.
I have tried to run run the code from-
how ptrace work between 2 processes
, but not getting any output.
This is specifically asked to know how to use ptrace to make a program execute only few instructions like in debugger case.
How to use ptrace in following situation to restrict other process to access few instructions?
I have two independent C programs under linux. Program1 is running in CPU core-1 and Program-2 is running in CPU core-2.
Program-2 is executing shared library function, func-2, consists of 200 lines of instruction to perform operation (add+shift) on data .
shared library function
-------
func-2()
{
// code to perform operation (add+shift) on data
}
--------
Program2.
main()
{
while(1)
func-2();
}
program1
main()
{
while()
{
// ptrace
// OR
//kill STOP <pid of program2>
// kill CONT <pid of program2>
}}
I want to restrict program-2 from inside Program-1 or bash , so that Program-2 can execute only few instructions or restrict to run for 1-2 microsec, I can't add any code inside Program2.
Program-1 knows PID of program2 and base address of func-2.
I heard about ptrace which can be used to control other process and using ptrace it is possible to restrict process to execute only 1 instruction. For me even restrict process for 1-2 microsec ( 5-10 instructions) will be sufficient.
How can I control Program-2 which is running other CPU core? Any link to relevant documents is highly appreciated. Thanks in advance.
I am using gcc under linux.

hyperthreading code example

Is there some sample code that exemplifies Intel's Hyperthreading performance? Is it at all accessible from user space, or does that CPU do all the work transparently for the programmer? This is for C, Linux.
Hyperthreading performance depends on many factors and is difficult to estimate.
Just to shortly explain Hyperthreading:
Each core has more then one register set, but no additional execution units
The hyperthreads are scheduled more or less evenly
So you only really get additional performance out of hyperthreads if the two threads running on the same core use different execution units and each thread on it's own would have too many adata dependencies. For example one thread only does integer ops, the other one only floating point. Then you can see extra performance because you are using more execution units per cycle.
But this in turn depends on how your OS schedules threads onto hyperthreads. From the point of view of the OS each hyperthread is a logical CPU. So it's entirely up to the scheduler what to put there and when.
In practice hyperthreads will give you at most 10-20% extra performance. On our HPC we have turned them off (for licensing reasons mainly though).
To answer your actual question: you can not deploy code onto hyperthreads directly yourself. The OS will do that for you. You can set scheduling affinities for your userland threads, but it is still all up to the scheduler to actually deploy your threads. This is done transparently to the programmer. A good scheduler will deploy your code evenly first on cores and only resort to hyperthreads if all cores are busy.
The userland coltrol syscalls you are looking for are sched_setaffinity and pthread_setaffinity_np.
The following example code will deploy two threads on logical CPUs 0 and 1 which will correspond to the two hyperthreads on the first logical core of the first socket if hyperthreads are enabled. Still it is up to the scheduler to actually put them there. If those hyperthreads are busy then your code will sleep:
#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdlib.h>
void * my_thread(intptr_t cput_o_run_on) {
cpuset_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(cput_o_run_on, &cpuset);
pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
// force a rescheduling
sched_yield();
// do something useful
return NULL;
}
int main() {
pthread_t thread;
pthread_create(&thread, NULL, my_thread, 0);
pthread_create(&thread, NULL, my_thread, 1);
for (;;);
return 0;
}

how to force a c program to run on a particular core

Say I have the following c program:
#include <stdio.h>
int main()
{
printf("Hello world \n");
getchar();
return 0;
}
gcc 1.c -o helloworld
and, say I have a dual core machine:
cat /proc/cpuinfo | grep processor | wc -l
Now my question is, when we execute the program, how do we force this program to run in core-0 (or any other particular core)?
How to do this programmatically? examples, api's, code reference would be helpful.
If there is no api's available then is there any compile time, link time, load time way of doing this?
OTOH, how to check whether a program is running in core-0 or core-1 (or any other core)?
Since you are talking about /proc/cpu, I assume you are using linux. In linux you would use the sched_setaffinity function. In your example you would call
cpu_set_t set;
CPU_ZERO(&set); // clear cpu mask
CPU_SET(0, &set); // set cpu 0
sched_setaffinity(0, sizeof(cpu_set_t), &set); // 0 is the calling process
Look up man sched_setaffinity for more details.
This is OS-specific. As Felice points out, you can do it on Linux by calling sched_setaffinity in your program. If you end up running on multiple platforms, though, you'll have to code something different for each.
Alternatively, you can specify the affinity when you launch your executable, from the command line or a run script or whatever.
See taskset for a Linux command-line tool to do this.

how to slow down a process?

Suppose I have a program that runs in a given amount of time (say, three seconds). I want to run this program so that it runs n-times slower (specified on command line). How would you achieve it with (or better, without) changes to the program ?
please note that adding a sleep at the end is not a solution. The program has to run slower, not to run at full speed for the first three seconds and then do nothing for the remaining time. Also, using "nice" under unix is not a good solution either. it will run slower if other processes demand the processor, but at full speed if nothing is processor-demanding at the same time.
This is a curiosity question. Nothing serious to do related to it. The fact is that I remember 15-20 years ago games that were simply too fast to play on new processors, because they were timed with the processor clock. You had to turn off the turbo.
Let's assume the program is a C compiled program.
One idea is to write a 'ptrace runner.' ptrace is the call that allows you to implement a debugger on platforms such as Linux and Mac.
The idea is to attach to the program and then just repeatedly tell the application to run one instruction with ptrace(PTACE_SINGLESTEP). If that's not slow enough, you could add a sleep between each call to ptrace in the runner program.
I wrote a simple example on my linux box how to slow down a child process with SIGSTOP and SIGCONT signals:
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
void dosomething(void){
static volatile unsigned char buffer[1000000];
for(unsigned i=0;i<1000;i++) for(unsigned j=0;j<sizeof(buffer);buffer[j++]=i){;}
}
#define RUN 1
#define WAIT 1
int main(void){
int delay=0, status, pid = fork();
if( !pid ){ kill(getpid(),SIGSTOP); dosomething(); return 0; }
do{
waitpid( pid, &status, WUNTRACED | WCONTINUED );
if( WIFSTOPPED (status) ){ sleep(delay); kill(pid,SIGCONT); }
if( WIFCONTINUED(status) && WAIT ){ sleep(RUN ); kill(pid,SIGSTOP); }
delay=WAIT;
}while( !WIFEXITED(status) && !WIFSIGNALED (status) );
}
No slowdown when WAIT is zero, otherwise after every RUN seconds the parent stop the child for WAIT seconds.
Runtime results:
RUN=1 WAIT=0
---------------
real 3.905s
user 3.704s
sys 0.012s
RUN=1 WAIT=1
---------------
real 9.061s
user 3.640s
sys 0.016s
RUN=1 WAIT=2
---------------
real 13.027s
user 3.372s
sys 0.032s
cpulimit is a tool that does something like this. It works by periodically
kill -STOP and kill -CONT the process, which has the effect of it running slower (when averaged over time).
If you have DTrace you may be able to use it's chill() function. You could insert this chill at almost anyplace in a userland application and in multiple places. It's been used before to replicate race conditions seen on slower systems.
I ran some application in a virtual machine under ubuntu. It was really slow.
You could configure the virtual machine usage of the system.
You might obfuscate the situation a little further by running a virtual machine under a virtual machine under a virtual machine, ...

Resources