Fork()ing and running on specific set of CPUs

Fork()ing and running on specific set of CPUs - c

I have a parent process, which I use to spawn a series of child processes, which each run their own program sequentially. Each of these programs change a file over time, I want to read the data from this file and see how it changes as each program runs.
I need two sets of data for this to work, the value of the file at some set interval (I haven't decided on the interval yet), and the time each program takes to run, there are other variables which can influence the execution times of these programs, which I want to see also.
So I figured to get more accurate timing of the child process while still reading from a file I could run them on different cores. I have 8 cores, I would like to run the parent process on 0-3, then fork the child to run on 4-7. I'm not sure if this is possible though within C, and a search around hasn't yielded any answers, which makes me think it isn't.
Within Linux, outside of a program, I can use taskset to do this.
I plan on setting aside 4 of the cores using the kernel parameter isolcpus(). I want as little noise as possible while running the child programs.

Asking the kernel to associate CPU cores with threads or processes is also known as setting the "affinity" between the core and the process/thread.
Under linux, there exists a set of functions that provide this capability. Take a look at the manual page for one of the functions...
man pthread_setaffinity_np
This family of API calls might be able to give you what you need.
That man page has a "see also" section that links to the other functions in this family.
Typically with features such as these that deal with kernel process and thread scheduling, it is entirely dependent on what mood the kernel is in at the time as to whether your requests are met or ignored. Your mileage may very due to system load or the number of available cores. Even if a system has 16 cores, these features may be disabled in the kernel compilation settings (think virtual machines). Equally, you may find that there are some additional options that you may be able to add to your kernel to get better results than the defaults.

Related

Multi-threading on ARM cortex A9 dual core (Linux or VxWorks)

I am working on how dual core (especially in embedded systems) could be beneficial. I would like to compare two targets: one with ARM -cortex-A9 (925 MHz) dual core, and the other with ARM cortex-A8 single core.
I have some ideas (please see below), but I am not sure, I will use the dual core features:
My questions are:
1-how to execute several threads on different cores (without OpenMP, because it didn't work on my target, and it isn't compatible with VxWorks)
2-How the kernel execute code on dual core with shared memory: how it allocate stack, heap, memory for global variable and static ones?
3-Is it possible to add C-flags in order to indicate number of CPU cores so we will be able to use dual core features.
4-How the kernel handle program execution (with a lot of threads) on dual core.
Some tests to compare two architecture regarding OS and Dual core / single core
Dual core VS single core:
create three threads that execute some routines and depend on each results other (like matrix multiplication). Afterwards measure the time taken to on the dual core once and the on the single core (how is it possible without openMP)
Ping pong:
One process sends a message to the other.two processes repeatedly pass a message back and forth. We should investigate how the time taken varies with the size of the message for each architecture.
One to all:
A process with rank 0 sends the same message to all other processes in the program and then receives a message of the same length from all other processes. How does the time taken varies with the size of the messages and with the number of processes?

Short answers wrt. Linux only:
how to execute several threads on different cores
Use multiple processes, or pthreads within a single process.
How the kernel execute code on dual core with shared memory: how it allocate stack, heap, memory for global variable and static ones?
In Linux, they all belong to the process. (You can declare thread-local variables, though, with for example the __thread keyword.)
In other words, if a thread allocates some memory, that memory is immediately visible to all other threads in the same process. Even if that thread exits, nothing happens to the memory. It is perfectly normal for some other thread to free the memory later on.
Each thread does get their own stack, which by default is quite large. (With pthreads, this is easy to control using a pthread_attr_t that specifies a smaller stack size.)
In general, there is no difference in memory handling or program execution, between a single-threaded and a multi-threaded process, on the kernel side. (Userspace code needs to use memory and threads properly, of course; the kernel does not try to stop stupid userspace code from shooting itself in the head, at all.)
Is it possible to add C-flags in order to indicate number of CPU cores so we will be able to use dual core features.
Yes, for example by examining the /proc/cpuinfo pseudofile.
However, it is much more common to leave such details for the system administrator. Services like Apache etc. let the administrator configure the number, either in a configuration file, or in the command line. Even make supports a -j JOBS parameter, allowing multiple compilation tasks to run in parallel.
I very warmly recommend you forget about any detection magic, and instead let the user specify the number of threads used. (If you insist, you could use detection magic for the default, but then only if the user/admin has not given you any hints about the number of threads to be used.)
It is also possible to set the CPU mask for each thread, specifying the set of cores that thread may run on. Other than in benchmarks, where this is done more for repeatability than anything else, or in dedicated processes designed to hog all the resources on a machine, it is very rare.
How the kernel handle program execution (with a lot of threads) on dual core.
The exact same way as it handles a lot of simultaneous processes.
There are some controls (control group stuff, cpu masks) and maybe resource accounting that are handled differently between separate processes and threads within the same process, but in general terms, separate threads in a single process are executed the same way as separate processes are.

In Unix-ish environments, is PID wraparound guaranteed to change process start time?

Context:
I'm academically interested in tracking/identifying UNIX processes in a way that is proof against PID wraparound. To start tracking a process by PID, I need to be able to conclusively identify it on the system.
Thus, I need a function, get_identity, that takes a PID, and only returns once it has determined a system-wide unique identity for that PID. The function should work on all or most POSIX-compliant systems.
The only immutable values in the process table that I know of are PID and start time. However, the following scenario poses a problem:
User calls get_identity(pid)
get_identity reads the start time in seconds-since-the-epoch of pid, if it exists, and returns the hopefully-unique tuple [pid, starttime] (this is what the excellent psutil Python library considers "unique enough", so it should be pretty robust).
Within a second of that call, PID wraparound occurs on the system, and pid is recycled.
The [pid, starttime] tuple now refers to a different process than was present at the call to get_identity.
While it is extremely improbable for PID wraparound to occur and re-use the selected PID within a second of its being identified, it is not impossible . . . right?
Questions:
Is there a guarantee on UNIX/POSIX-compliant systems that the start time of a PID will be different between wraparound-caused re-uses of that same PID value?
If not, how can I uniquely identify a process on a wraparound-prone system?
What I've Tried:
I can simply sleep for a second after examining the target process. If the start-time-in-seconds is the same after the sleep, then it's either the same process that I started watching, or the PID has wrapped around to a different one but the system cannot tell the difference. If the start time has changed, I can return an error, or start over. However, this requires my identification function to wait for up to 1 second before returning, which is not ideal.
times() returns values in clock ticks, which I can convert to seconds. Assuming that the starttime-in-seconds of a process is based on the same clock that times uses, and assuming that all UNIXes use the same rounding logic to convert from clock ticks -> fractional seconds -> whole seconds, I could theoretically use this information to reduce the duration of the sleep in the above workaround to the time until the next "full second boundary according to the process table". However, the worst-case sleep time would still be nearly 1 second, so this is not ideal.
On Linux, I can get the starttime in jiffies (or CPU ticks, for old Linuxes) from the /proc/$pid/stat file. With that information, my program could wait one jiffy(ie?), check the starttime again, and, if it was the same, determine identity. This correctly solves my problem (1 jiffy + overhead is a fast enough runtime), but only on Linux; other UNIX platforms may not have /proc. On BSD, that information is available via the kvm subsystem or via sysctls. On other Unixes . . . who knows? I'd need to develop multiple platform-specific implementations to gather this data--something I'd prefer to avoid.

Since the assignment of PIDs and proc table management in general is not defined by any standard it's literally impossible to do what you want in a portable way.
You will need to do as you say and develop multiple platform-specific implementations to gather enough information about a process to determine a unique identity for every process.
On the other hand if you don't need this information in real time as the processes are started and while they are still running you can, on most unix-y systems, simply turn on process accounting and have a guaranteed unique and complete record of every process that has been run by the system. Process accounting files are not standardized either, but there will be header files defining their record format, and there should be tools on each type of system which can process and summarize accounting files in various ways.

PID wrap around is guaranteed. You'll never get two processes with the same pid.

CPU load for C process with respect to core(s)

I am about to find out how a specific process, in C, loads the CPU over a certain time frame.
The process may switch processor core during runtime, therefore
I need to handle that too. The CPU is an ARM processor.
I have looked at different ways to get the load, from standard top,
perf and also to calculate the load through the statistics given in the
/proc/[pid]/stat-file.
My thoughts is to have a program that read the /proc/[pid]/stat-file as suggested in the thread:
"How to calculate the CPU usage of a process by PID in Linux from C?" and calculate the load accordingly.
But how would I treat core switching? I need to notice it and adjust the load calculation.
How would you recommend me to achieve this?
Update: How can I see which core the process runs in and by that examine if it has switched core since the last chack assuming I poll the process figures/statistics at least twice?

The perf tool can tell you the amount of cpu-migrations your process has made. That is, how many times the process has switched cpu. It won't tell you which cpu cores though.

which is better way to edit RLIMIT_NPROC value

My application creates per connection thread . Application is ruinng under the non-zero user id and Sometimes number of threads surpasses default value 1024 . I want to edit this number so I have few options
run as root [very bad idea and also have to compromise with securty ,so dropping it]
run under underprivilaged user use setcap and give capability CAP_SYS_RESOURCE . then I can add code im my program
struct rlimit rlp; /* will initilize this later with values of nprocs(maximum number of desired threads)*/
setrlimit(RLIMIT_NPROC, &rlp);
/*RLIMIT_NPROC
*The maximum number of processes (or, more precisely on Linux, threads) that can
* created for the real user ID of the
*calling process. Upon encountering this limit, fork(2) fails with the error
*EAGAIN. */
Other thing is editing /etc/securitylimits.conf where simply I can make entry for the development user and can put lines e.g.
#devuser hard nproc 20000
#devuser soft nproc 10000
where 10k is enough .So being litle reluctant in chaning source code should I proceed with last option . And I am more curios to know what is more robust and standars approach.
seeking your opinions , and thank you in advance :)
PS: What will happen if a single process will be served with more than 1k threads . ofcource i have 32GB of Ram also

First, I believe you are wrong in having nearly a thousand threads. Threads are quite costly, and it is usually not reasonable to have so much of them. I would suggest having a few dozen threads at most (unless you run on a very costly super-computer).
You could have some event loop around a multiplexing syscall like poll(2). Then a single thread can deal with many thousands of connections. Read about the C10K problem and epoll. Consider using some event libraries like libevent or libev etc...
You could start your application as root (perhaps by using setuid techniques), set-up the required resources (in particular, opening privileged TCP/IP ports), and change the user with setreuid(2)
Read Advanced Linux Programming...
You could also wrap your application around a tiny setuid C program which increase the limits using setrlimit(2), change the user with setreuid, and at last execve(2) your real program.

how to know current thread running on which cpu in c [duplicate]

Presumably there is a library or simple asm blob that can get me the number of the current CPU that I am executing on.

Use sched_getcpu to determine the CPU on which the calling thread is running. See man getcpu (the system call) and man sched_getcpu (a library wrapper). However, note what it says:
The information placed in cpu is only guaranteed to be current at the time of the call: unless the CPU affinity has been fixed using sched_setaffinity(2), the kernel might change the CPU at any time. (Normally this does not happen because the scheduler tries to minimize movements between CPUs to keep caches hot, but it is possible.) The caller must be prepared to handle the situation when cpu and node are no longer the current CPU and node.

You need to do something like:
Call sched_getaffinity and identify the CPU bits
Iterate over the CPUs, doing sched_setaffinity to each one
(I'm not sure if after sched_setaffinity you're guaranteed to be on the CPU, or
need to yield explicitly ?)
Execute CPUID (asm instruction)... there is a way of getting a unique per-core ID out of one of it's outputs (see Intel docs). I vaguely recall it's the "APIC ID".
Build a table (a std::map ?) from APIC IDs to a CPU number or affinity mask or something.
If you did this on your main thread, don't forget to set sched_setaffinity back to all CPUS!
Now you can CPUID again whenever you need to and lookup which core you're on.
But I'd query why you need to do this; normally you want to take control via sched_setaffinity rather than finding out which core you're on (and even that's a pretty rare thing to want/need). (That's why I don't know the crucial detail of what to pull out of CPUID exactly, sorry!)
Update: Just learned about sched_getcpu from litb's response here. Much better! (my Debian/etch libc is too old to have it though).

I don't know of anything to get your current core id. With kernel level task/process migration, you wouldn't be guaranteed that it would remain constant for any length of time, unless you were running in some form of real-time mode.
If you want to be on a specific core, you can put use that sched_setaffinity() function or the taskset command to launch your program. I believe that these need elevated permissions to work, though. In your program, you could then run sched_getaffinity() to see the mask that was set earlier and use that as a best guess at the core on which you are executing.

sysconf(_SC_NPROCESSORS_ONLN);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight