how to open /dev/console in C - c

I was reading wayland/weston code, the setting up tty part. I found it tries to acquire an available tty for doing KMS and start windows.
This is how it does:
if (!wl->new_user) {
wl->tty = STDIN_FILENO;
} else if (tty) {
t = ttyname(STDIN_FILENO);
if (t && strcmp(t, tty) == 0)
wl->tty = STDIN_FILENO;
else
wl->tty = open(tty, O_RDWR | O_NOCTTY);
} else {
int tty0 = open("/dev/tty0", O_WRONLY | O_CLOEXEC);
char filename[16];
if (tty0 < 0)
error(1, errno, "could not open tty0");
if (ioctl(tty0, VT_OPENQRY, &wl->ttynr) < 0 || wl->ttynr == -1)
error(1, errno, "failed to find non-opened console");
snprintf(filename, sizeof filename, "/dev/tty%d", wl->ttynr);
wl->tty = open(filename, O_RDWR | O_NOCTTY);
close(tty0);
}
in src/weston-launch.c.
It tries to open('/dev/tty0') and find a tty that available if no tty is specified.
But you can't do that, neither /dev/tty0 nor 'available tty' belongs to you. I tested with my simpler version. And of course I couldn't open /dev/tty0.
Do you guys know how this magic is done?

The actual available devices for a tty depend on the system. On most interactive Unix/Unix-like systems you will have a "tty" whose name can be found from the command-line program tty. For example:
$ tty
/dev/pts/2
Likely, you also have a device named "tty", e.g.,
$ ls -l /dev/tty
lrwxrwxrwx 1 root other 26 Feb 9 2014 /dev/tty -> ../devices/pseudo/sy#0:tty
$ ls -lL /dev/tty
crw-rw-rw- 1 root tty 22, 0 Feb 9 2014 /dev/tty
You cannot open just any tty device, because most of them are owned by root (or other users to which they have been assigned).
For further discussion about the differences between /dev/console, /dev/tty and other tty-devices, see Cannot open /dev/console.
According to the console_codes(4) manual page:
VT_OPENQRY
Returns the first available (non-opened) console. argp points to an int which is set to the number of the vt (1 <= *argp <= MAX_NR_CONSOLES).
and for example on a Linux system I see this in /dev:
crw-rw-rw- 1 root 5, 0 Mon 04:20:13 tty
crw------- 1 root 4, 0 Mon 03:58:52 tty0
crw------- 1 root 4, 1 Mon 04:00:41 tty1
crw------- 1 tom 4, 2 Mon 04:30:31 tty2
crw------- 1 root 4, 3 Mon 04:00:41 tty3
crw------- 1 root 4, 4 Mon 04:00:41 tty4
crw------- 1 root 4, 5 Mon 04:00:41 tty5
crw------- 1 root 4, 6 Mon 04:00:41 tty6
crw------- 1 root 4, 7 Mon 03:58:52 tty7
crw------- 1 root 4, 8 Mon 03:58:52 tty8
crw------- 1 root 4, 9 Mon 03:58:52 tty9
crw------- 1 root 4, 10 Mon 03:58:52 tty10
crw------- 1 root 4, 11 Mon 03:58:52 tty11
All of those tty devices except one for which I have opened a console session are owned by root. To be able to log into one, a program such as getty acts to temporarily change its ownership. Doing a ps on my machine shows for example
root 2977 1 0 04:00 tty1 00:00:00 /sbin/getty 38400 tty1
root 2978 1 0 04:00 tty2 00:00:00 /bin/login --
root 2979 1 0 04:00 tty3 00:00:00 /sbin/getty 38400 tty3
root 2980 1 0 04:00 tty4 00:00:00 /sbin/getty 38400 tty4
root 2981 1 0 04:00 tty5 00:00:00 /sbin/getty 38400 tty5
root 2982 1 0 04:00 tty6 00:00:00 /sbin/getty 38400 tty6
Note that getty is running as root. That gives it the privilege to change the ownership of the tty device as needed. That is, while the ioctl may identify an unused tty, you need elevated privileges to actually open it. Linux (like any other Unix-like system) does not have a way to provide ensure that one process has truly exclusive access to a terminal. So it uses the device ownership and permissions to ensure this access.

If you're not the superuser then you should only try to access /dev/tty. That is a special device synonym for whichever tty is controlling the current process.

Related

why open file descriptors are not getting reused instead they are increasing in number value

I have a simple C HTTP server. I close file descriptors for disk files and new connection fds returned by accept(...), but I noticed that I am getting new file descriptor numbers that are bigger than the previous numbers: for example file descriptor from accept return starts with 4, then 5, then 4 again and so on until file descriptor reaches max open file descriptor on a system.
I have set the value to 10,000 on my system but I am not sure why exactly file descriptor number jumps to max value. And I am kind of sure than my program is closing the file descriptors.
So I would like to know if there are not thousands of connections then how come file descriptor new number are increasing periodically: in around 24 hours I get message accept: too many open files. What is this message?
Also, does ulimit -n number value get reset automatically without system reboot?
as mentioned in the answer. The output of _2$ ps aux | grep lh is
dr-x------ 2 fawad fawad 0 Oct 11 11:15 .
dr-xr-xr-x 9 fawad fawad 0 Oct 11 11:15 ..
lrwx------ 1 fawad fawad 64 Oct 11 11:15 0 -> /dev/pts/3
lrwx------ 1 fawad fawad 64 Oct 11 11:15 1 -> /dev/pts/3
lrwx------ 1 fawad fawad 64 Oct 11 11:15 2 -> /dev/pts/3
lrwx------ 1 fawad fawad 64 Oct 11 11:25 255 -> /dev/pts/3
and the output of ls -la /proc/$$/fd is
root 49855 0.5 5.4 4930756 322328 ? Sl Oct09 15:58 /usr/share/atom/atom --executed-from=/home/fawad/Desktop/C++-work/lhparse --pid=49844 --no-sandbox
root 80901 0.0 0.0 25360 5952 pts/4 S+ 09:32 0:00 sudo ./lh
root 80902 0.0 0.0 1100852 2812 pts/4 S+ 09:32 0:00 ./lh
fawad 83419 0.0 0.0 19976 916 pts/3 S+ 11:27 0:00 grep --color=auto lh
I like to know what is pts/4 etc. column. is this the file descriptor number.
It's likely that the socket that is represented by the file descriptor is in close_wait or time_wait state. Which means the TCP stack holds the fd open for a bit longer. So you won't be able to reuse it immediately in this instance.
Once the socket is fully finished with and closed, the file descriptor number will then available for reuse inside your program.
See: https://en.m.wikipedia.org/wiki/Transmission_Control_Protocol
Protocol Operation and specifically Wait States.
To see what files are still open you can run
ls -la /proc/$$/fd
The output of this will also be of help.
ss -tan | head -5
LISTEN 0 511 *:80 *:*
SYN-RECV 0 0 192.0.2.145:80 203.0.113.5:35449
SYN-RECV 0 0 192.0.2.145:80 203.0.113.27:53599
ESTAB 0 0 192.0.2.145:80 203.0.113.27:33605
TIME-WAIT 0 0 192.0.2.145:80 203.0.113.47:50685

Reopen an existing file descriptor with open("/dev/fd/n", mode)

I am studying in System Programming.
If we call open("/dev/fd/n", mode), we duplicate the n-th file descriptor and assign to a new file descriptor.
However, the mode we specify needs to be the subset of the referenced file (/dev/fd/n), and, I was wondering how is this working.
Does this create a new entry in the open file table?
If it does, why should the mode be a subset of /dev/fd/n's file status flag?
If not, how could I have two different file descriptor pointing to the same entry in file entry table with different file status flag?
When we open /dev/fd/n, we are not "duplicating" a file descriptor. We are opening a brand new file.
You may be confusing this with using dup. Since we know the binary value of n, we could do: int fdn = dup(n);. That would share things.
But, that is not what we're doing.
/dev/fd is a symlink to /proc/self/fd. If we do ls -l /proc/self/fd > /tmp/out, we'll get something like:
total 0
lrwx------. 1 cae cae 64 Nov 7 00:16 0 -> /dev/pts/2
l-wx------. 1 cae cae 64 Nov 7 00:16 1 -> /tmp/out
lrwx------. 1 cae cae 64 Nov 7 00:16 2 -> /dev/pts/2
lr-x------. 1 cae cae 64 Nov 7 00:16 3 -> /proc/35153/fd
If we do:
fd = open("/proc/self/0",O_WRONLY);
this is identical to doing:
fd2 = open("/dev/pts/2",O_WRONLY);
fd and fd2 do not share any flags/modes, etc. They are completely separate. Nor do they have any common flags/modes with fd 0.
Note that I deliberately specified /proc/self/0 [which is open for reading] and, yet, we opened it for writing.
It does not care about [nor use] the original descriptors flags, etc. Once again, it is just a "double level" symlink to the full path of the final target file: /dev/pts/2
It is the file permissions of the target file that dictate whether a given open is allowed (e.g. if the permissions were 0444, and we tried to open with O_WRONLY, that would return EPERM).
This would be no different than if we had a directory that looked like:
total 0
-rw-r--r--. 1 cae cae 0 Nov 7 00:29 a
lrwxrwxrwx. 1 cae cae 1 Nov 7 00:29 b -> a
lrwxrwxrwx. 1 cae cae 1 Nov 7 00:29 c -> a
lrwxrwxrwx. 1 cae cae 1 Nov 7 00:29 d -> c
We could do:
int fd1 = open("a",O_RDONLY);
int fd2 = open("b",O_WRONLY);
int fd3 = open("c",O_WRONLY);
int fd4 = open("d",O_WRONLY);
Those four file descriptors don't share anything. But, they are four separate streams to the same file. So, if we write to any of fd2, fd3, or fd4. Then, read from fd1 and we'll see the effect.

The influence of file mode when file is read and written by a same user in different processes

This is my code
fd=open("a",O_RDWR | O_CREAT);
printf("%d\n", fd);
if(fd < 0)
{
perror("error");
exit(1);
}
lseek(fd, 0, SEEK_SET);
read(fd, buf, 10);
write(STDOUT_FILENO, buf, 10);
getchar();//1
lseek(fd, 0, SEEK_SET);
write(fd, "xxxxxxxxxx", 10);
getchar();//2
lseek(fd, 0, SEEK_SET);
read(fd, buf, 10);
write(STDOUT_FILENO, buf, 10);
getchar();//3
next is something about file a
//file a, mode 600
//aaaaaaaaaaa
when at step 2, the text of file a will be changed into "xxxxx...".
then I use vim to change the text into "bbbbbbb..." in another terminal.
the output at step 3 is "xxxxx..."
however, when file a is
//file a, mode 606 or 660
//aaaaaaaaaaaa
do same thing as above
the output is "bbbbbbb...."
my system is os x 10.9
I can reproduce the problem, to my considerable surprise (Mac OS X 10.9.4).
However, as I hinted might be a possibility in my comment, the problem seems to be that vim is changing the inode number of the file when the file has 600 permission:
$ for mode in 600 606 660 666
> do
> echo "Mode: $mode"
> echo "abcdefghijklmnopqrst" > a
> chmod $mode a
> ls -li a
> vim a
> cat a
> ls -li a
> done
Mode: 600
25542402 -rw------- 1 jleffler staff 21 Sep 2 07:58 a
xxxxxxxxxxklmnopqrst
25542484 -rw------- 1 jleffler staff 21 Sep 2 07:58 a
Mode: 606
25542484 -rw----rw- 1 jleffler staff 21 Sep 2 07:58 a
xxxxxxxxxxklmnopqrst
25542484 -rw----rw- 1 jleffler staff 21 Sep 2 07:58 a
Mode: 660
25542484 -rw-rw---- 1 jleffler staff 21 Sep 2 07:58 a
xxxxxxxxxxklmnopqrst
25542484 -rw-rw---- 1 jleffler staff 21 Sep 2 07:58 a
Mode: 666
25542484 -rw-rw-rw- 1 jleffler staff 21 Sep 2 07:58 a
xxxxxxxxxxklmnopqrst
25542484 -rw-rw-rw- 1 jleffler staff 21 Sep 2 07:58 a
$
In each case, I ran the command 10rx and :x in vim.
I'm not clear why vim needs to change the inode when the file is 600 permission, but it smacks of a bug from where I'm sitting. It is behaviour I would not have expected at all (except that it explained what you saw).
Because the 'file descriptor' program (the outline code in the question) keeps the same file open, the inode number of the file it is working with does not change, but because vim rewrites the file with a new inode number (meaning: it creates a new file with a new name and inode number containing the modified contents, then removes the old version of a and replaces it with the new file), the edit made by vim (when the file has 600 permission) is not seen in the file that the program has open. At the end of the 'file descriptor' program when the permissions are 600, the file that it had open has no name and its contents are deleted by the system; the file that vim created has taken the place of the original file.

How to map /proc/bus/usb/devices entry to a /dev/sdX device?

I need to know how I can figure out to which entry in /proc/bus/usb/devices a /dev/sdX device maps to. Basically, I need to know the vendor id and product id of a given USB stick (which may not have a serial number).
In my case, I have this entry for my flash drive in /proc/bus/usb/devices:
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=0781 ProdID=5530 Rev= 2.00
S: Manufacturer=SanDisk
S: Product=Cruzer
S: SerialNumber=0765400A1BD05BEE
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=200mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I happen to know that in my case it is /dev/sda, but I'm not sure how I can figure this out in code. My first approach was to loop through all /dev/sdXX devices and issue a SCSI_IOCTL_GET_BUS_NUMBER and/or SCSI_IOCTL_GET_IDLUN request, but the information returned doesn't help me match it up:
/tmp # ./getscsiinfo /dev/sda
SCSI bus number: 8
ID: 00
LUN: 00
Channel: 00
Host#: 08
four_in_one: 08000000
host_unique_id: 0
I'm not sure how I can use the SCSI bus number or the ID, LUN, Channel, Host to map it to the entry in /proc/bus/usb/devices. Or how I could get the SCSI bus number from the /proc/bus/usb/001/006 device, which is a usbfs device and doesn't appear to like the same ioctl's:
/tmp # ./getscsiinfo /proc/bus/usb/001/006
Could not get bus number: Inappropriate ioctl for device
Here's the test code for my little getscsiinfo test tool:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <scsi/scsi.h>
#include <scsi/sg.h>
#include <sys/ioctl.h>
struct scsi_idlun
{
int four_in_one;
int host_unique_id;
};
int main(int argc, char** argv) {
if (argc != 2)
return 1;
int fd = open(argv[1], O_RDONLY | O_NONBLOCK);
if (fd < 0)
{
printf("Error opening device: %m\n");
return 1;
}
int busNumber = -1;
if (ioctl(fd, SCSI_IOCTL_GET_BUS_NUMBER, &busNumber) < 0)
{
printf("Could not get bus number: %m\n");
close(fd);
return 1;
}
printf("SCSI bus number: %d\n", busNumber);
struct scsi_idlun argid;
if (ioctl(fd, SCSI_IOCTL_GET_IDLUN, &argid) < 0)
{
printf("Could not get id: %m\n");
close(fd);
return 1;
}
printf("ID: %02x\n", argid.four_in_one & 0xFF);
printf("LUN: %02x\n", (argid.four_in_one >> 8) & 0xFF);
printf("Channel: %02x\n", (argid.four_in_one >> 16) & 0xFF);
printf("Host#: %02x\n", (argid.four_in_one >> 24) & 0xFF);
printf("four_in_one: %08x\n", (unsigned int)argid.four_in_one);
printf("host_unique_id: %d\n", argid.host_unique_id);
close(fd);
return 0;
}
Does anyone have any idea?
udevadm is capable of what your are trying to achieve.
udevadm info -a -p $(udevadm info -q path -n /dev/sda)
udevadm's sources will tell you how it is done.
I believe you can collect such information using libudev library.
Here are some details about it: http://www.signal11.us/oss/udev/
I found something like this on above site:
.. Using libudev, we'll be able to inspect the devices, including their Vendor ID (VID), Product ID (PID), serial number, and device strings, without ever opening the device. Further, libudev will tell us exactly where inside /dev the device's node is located, giving the application a robust and distribution-independent way of accessing the device. ...
This isn't all that easy, nor very well documented (at least from a high-level perspective). The following should work in Kernel's from version 3.1 upward (at least).
I have found the easiest (probably not the only way) is to navigate from the block device entry and test each block device until you find the one that matches your USB entry.
For example, given a block device in /sys/block, such as sdb, you can find the hardware device descriptor entry like this:
# cd /sys/block
# cd $(readlink sdb); cd ../../../../../..
# ls -l
total 0
drwxr-xr-x 6 root root 0 Aug 14 10:47 1-1:1.0
-rw-r--r-- 1 root root 4096 Aug 14 10:52 authorized
-rw-r--r-- 1 root root 4096 Aug 14 10:52 avoid_reset_quirk
-r--r--r-- 1 root root 4096 Aug 14 10:47 bcdDevice
-rw-r--r-- 1 root root 4096 Aug 14 10:49 bConfigurationValue
-r--r--r-- 1 root root 4096 Aug 14 10:47 bDeviceClass
-r--r--r-- 1 root root 4096 Aug 14 10:49 bDeviceProtocol
-r--r--r-- 1 root root 4096 Aug 14 10:49 bDeviceSubClass
-r--r--r-- 1 root root 4096 Aug 14 10:49 bmAttributes
-r--r--r-- 1 root root 4096 Aug 14 10:49 bMaxPacketSize0
-r--r--r-- 1 root root 4096 Aug 14 10:49 bMaxPower
-r--r--r-- 1 root root 4096 Aug 14 10:49 bNumConfigurations
-r--r--r-- 1 root root 4096 Aug 14 10:49 bNumInterfaces
-r--r--r-- 1 root root 4096 Aug 14 10:49 busnum
-r--r--r-- 1 root root 4096 Aug 14 10:52 configuration
-r--r--r-- 1 root root 65553 Aug 14 10:47 descriptors
-r--r--r-- 1 root root 4096 Aug 14 10:52 dev
-r--r--r-- 1 root root 4096 Aug 14 10:49 devnum
-r--r--r-- 1 root root 4096 Aug 14 10:52 devpath
lrwxrwxrwx 1 root root 0 Aug 14 10:47 driver -> ../../../../../../bus/usb/drivers/usb
drwxr-xr-x 3 root root 0 Aug 14 10:52 ep_00
-r--r--r-- 1 root root 4096 Aug 14 10:47 idProduct
-r--r--r-- 1 root root 4096 Aug 14 10:47 idVendor
-r--r--r-- 1 root root 4096 Aug 14 10:52 ltm_capable
-r--r--r-- 1 root root 4096 Aug 14 10:47 manufacturer
-r--r--r-- 1 root root 4096 Aug 14 10:49 maxchild
lrwxrwxrwx 1 root root 0 Aug 14 10:52 port -> ../1-0:1.0/port1
drwxr-xr-x 2 root root 0 Aug 14 10:52 power
-r--r--r-- 1 root root 4096 Aug 14 10:47 product
-r--r--r-- 1 root root 4096 Aug 14 10:52 quirks
-r--r--r-- 1 root root 4096 Aug 14 10:47 removable
--w------- 1 root root 4096 Aug 14 10:52 remove
-r--r--r-- 1 root root 4096 Aug 14 10:47 serial
-r--r--r-- 1 root root 4096 Aug 14 10:49 speed
lrwxrwxrwx 1 root root 0 Aug 14 10:47 subsystem -> ../../../../../../bus/usb
-rw-r--r-- 1 root root 4096 Aug 14 10:47 uevent
-r--r--r-- 1 root root 4096 Aug 14 10:52 urbnum
-r--r--r-- 1 root root 4096 Aug 14 10:49 version
(You can find excellent documentation for the contents of the USB Descriptor here on the BeyondLogic site.)
Given the above, you should be able to map one or more of the USB device fields to the contents of /proc/bus/usb/devices. I find that the serial number is the easiest to match on, so that if you were to cat serial above, you would get the same serial number as listed:
T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=0781 ProdID=5575 Rev=01.26
S: Manufacturer=SanDisk
S: Product=Cruzer Glide
S: SerialNumber=4C530100801115115112
C: #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=200mA
I: If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
If you go to /sys/block, you can list the full path to the host device entry in the storage driver sysfs entry for each device. Typically, I do this using some programmatic means instead of at the shell prompt, but here you can see the links themselves:
# ls -l sd*
lrwxrwxrwx 1 root root 0 Aug 14 10:45 sda -> ../devices/pci0000:00/0000:00:10.0/host32/target32:0:0/32:0:0:0/block/sda
lrwxrwxrwx 1 root root 0 Aug 14 10:47 sdb -> ../devices/pci0000:00/0000:00:11.0/0000:02:03.0/usb1/1-1/1-1:1.0/host33/target33:0:0/33:0:0:0/block/sdb
Note that you mustn't make any assumptions about the numbers you see in the links. Depending upon the bus subsystem, the mappings could be quite different. For example, on a Raspberry Pi, it looks like this:
# ls -l sd*
lrwxrwxrwx 1 root root 0 Aug 13 23:54 sda -> ../devices/platform/soc/3f980000.usb/usb1/1-1/1-1.4/1-1.4:1.0/host3/target3:0:0/3:0:0:0/block/sda
lrwxrwxrwx 1 root root 0 Aug 13 23:54 sdb -> ../devices/platform/soc/3f980000.usb/usb1/1-1/1-1.3/1-1.3:1.0/host4/target4:0:0/4:0:0:0/block/sdb
So, the best approach is to take the approach listed at the top and navigate relative to the storage driver to find the USB device descriptor.
I'd be curious about more authoritative answers to this. The method above was arrived at by trial-and-error but has been working on several different devices and Kernels with no problem.
Instead of using proc/bus/usb which is for usbfs you can use /proc/scsi/scsi. In there you can find the Vendor and Serial number with specific channel ID and LUN number.

How to set CPU affinity for a process from C or C++ in Linux?

Is there a programmatic method to set CPU affinity for a process in c/c++ for the Linux operating system?
You need to use sched_setaffinity(2).
For example, to run on CPUs 0 and 2 only:
#define _GNU_SOURCE
#include <sched.h>
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
CPU_SET(2, &mask);
int result = sched_setaffinity(0, sizeof(mask), &mask);
(0 for the first parameter means the current process, supply a PID if it's some other process you want to control).
See also sched_getcpu(3).
Use sched_setaffinity at the process level, or pthread_attr_setaffinity_np for individual threads.
I have done many effort to realize what is happening so I add this answer for helping people like me(I use gcc compiler in linux mint)
#include <sched.h>
cpu_set_t mask;
inline void assignToThisCore(int core_id)
{
CPU_ZERO(&mask);
CPU_SET(core_id, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
}
int main(){
//cal this:
assignToThisCore(2);//assign to core 0,1,2,...
return 0;
}
But don't forget to add this options to the compiler command : -D _GNU_SOURCE
Because operating system might assign a process to the particular core, you can add this GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=2,3" to the grub file located in /etc/default and the run sudo update-grub in terminal to reserve the cores you want
UPDATE:
If you want to assign more cores you can follow this piece of code:
inline void assignToThisCores(int core_id1, int core_id2)
{
CPU_ZERO(&mask1);
CPU_SET(core_id1, &mask1);
CPU_SET(core_id2, &mask1);
sched_setaffinity(0, sizeof(mask1), &mask1);
//__asm__ __volatile__ ( "vzeroupper" : : : ); // It is hear because of that bug which dirtied the AVX registers, so, if you rely on AVX uncomment it.
}
sched_setaffinity + sched_getaffinity minimal C runnable example
This example was extracted from my answer at: How to use sched_getaffinity and sched_setaffinity in Linux from C? I believe the questions are not duplicates since that one is a subset of this one, as it asks about sched_getaffinity only, and does not mention C++.
In this example, we get the affinity, modify it, and check if it has taken effect with sched_getcpu().
main.c
#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void print_affinity() {
cpu_set_t mask;
long nproc, i;
if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
perror("sched_getaffinity");
assert(false);
}
nproc = sysconf(_SC_NPROCESSORS_ONLN);
printf("sched_getaffinity = ");
for (i = 0; i < nproc; i++) {
printf("%d ", CPU_ISSET(i, &mask));
}
printf("\n");
}
int main(void) {
cpu_set_t mask;
print_affinity();
printf("sched_getcpu = %d\n", sched_getcpu());
CPU_ZERO(&mask);
CPU_SET(0, &mask);
if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
perror("sched_setaffinity");
assert(false);
}
print_affinity();
/* TODO is it guaranteed to have taken effect already? Always worked on my tests. */
printf("sched_getcpu = %d\n", sched_getcpu());
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and run:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out
Sample output:
sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
sched_getcpu = 9
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sched_getcpu = 0
Which means that:
initially, all of my 16 cores were enabled, and the process was randomly running on core 9 (the 10th one)
after we set the affinity to only the first core, the process was moved necessarily to core 0 (the first one)
It is also fun to run this program through taskset:
taskset -c 1,3 ./a.out
Which gives output of form:
sched_getaffinity = 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
sched_getcpu = 2
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sched_getcpu = 0
and so we see that it limited the affinity from the start.
This works because the affinity is inherited by child processes, which taskset is forking: How to prevent inheriting CPU affinity by child forked process?
Python: os.sched_getaffinity and os.sched_setaffinity
See: How to find out the number of CPUs using python
Tested in Ubuntu 16.04.
In short
unsigned long mask = 7; /* processors 0, 1, and 2 */
unsigned int len = sizeof(mask);
if (sched_setaffinity(0, len, &mask) < 0) {
perror("sched_setaffinity");
}
Look in CPU Affinity for more details
It is also possible to make it through the shell without any modification in the programs with the cgroups and the cpuset sub-system. Cgroups (v1 at least) are typically mounted on /sys/fs/cgroup under which the cpuset sub-system resides. For example:
$ ls -l /sys/fs/cgroup/
total 0
drwxr-xr-x 15 root root 380 nov. 22 20:00 ./
drwxr-xr-x 8 root root 0 nov. 22 20:00 ../
dr-xr-xr-x 2 root root 0 nov. 22 20:00 blkio/
[...]
lrwxrwxrwx 1 root root 11 nov. 22 20:00 cpuacct -> cpu,cpuacct/
dr-xr-xr-x 2 root root 0 nov. 22 20:00 cpuset/
dr-xr-xr-x 5 root root 0 nov. 22 20:00 devices/
dr-xr-xr-x 3 root root 0 nov. 22 20:00 freezer/
[...]
Under cpuset, the cpuset.cpus defines the range of CPUs on which the processes belonging to this cgroup are allowed to run. Here, at the top level, all the CPUs are configured for all the processes of the system. Here, the system has 8 CPUs:
$ cd /sys/fs/cgroup/cpuset
$ cat cpuset.cpus
0-7
The list of processes belonging to this cgroup is listed in the cgroup.procs file:
$ cat cgroup.procs
1
2
3
[...]
12364
12423
12424
12425
[...]
It is possible to create a child cgroup into which a subset of CPUs are allowed. For example, let's define a sub-cgroup with CPU cores 1 and 3:
$ pwd
/sys/fs/cgroup/cpuset
$ sudo mkdir subset1
$ cd subset1
$ pwd
/sys/fs/cgroup/cpuset/subset1
$ ls -l
total 0
-rw-r--r-- 1 root root 0 nov. 22 23:28 cgroup.clone_children
-rw-r--r-- 1 root root 0 nov. 22 23:28 cgroup.procs
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.cpu_exclusive
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.cpus
-r--r--r-- 1 root root 0 nov. 22 23:28 cpuset.effective_cpus
-r--r--r-- 1 root root 0 nov. 22 23:28 cpuset.effective_mems
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.mem_exclusive
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.mem_hardwall
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.memory_migrate
-r--r--r-- 1 root root 0 nov. 22 23:28 cpuset.memory_pressure
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.memory_spread_page
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.memory_spread_slab
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.mems
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.sched_load_balance
-rw-r--r-- 1 root root 0 nov. 22 23:28 cpuset.sched_relax_domain_level
-rw-r--r-- 1 root root 0 nov. 22 23:28 notify_on_release
-rw-r--r-- 1 root root 0 nov. 22 23:28 tasks
$ cat cpuset.cpus
$ sudo sh -c "echo 1,3 > cpuset.cpus"
$ cat cpuset.cpus
1,3
The cpuset.mems files must be filled before moving any process into this cgroup. Here we move the current shell into this new cgroup (we merely write the pid of the process to move into the cgroup.procs file):
$ cat cgroup.procs
$ echo $$
4753
$ sudo sh -c "echo 4753 > cgroup.procs"
sh: 1: echo: echo: I/O error
$ cat cpuset.mems
$ sudo sh -c "echo 0 > cpuset.mems"
$ cat cpuset.mems
0
$ sudo sh -c "echo 4753 > cgroup.procs"
$ cat cgroup.procs
4753
12569
The latter shows that the current shell (pid#4753) is now located in the newly created cgroup (the second pid 12569 is the cat's command one as being the child of the current shell, it inherits its cgroups). With a formatted ps command, it is possible to verify on which CPU the processes are running (PSR column):
$ ps -o pid,ppid,psr,command
PID PPID PSR COMMAND
4753 2372 3 bash
12672 4753 1 ps -o pid,ppid,psr,command
We can see that the current shell is running on CPU#3 and its child (ps command) which inherits the its cgroups is running on CPU#1.
As a conclusion, instead of using sched_setaffinity() or any pthread service, it is possible to create a cpuset hierarchy in the cgroups tree and move the processes into them by writing their pids in the corresponding cgroup.procs files.

Resources