generating pretty and/or less system-intensive measuring-points - c

I work with C and I make apache modules and I work with strace as my main tool for debugging timings. Here's code I threw together. My apologies if variable names do not meet standards.
#include <stdio.h>
int main(){
long ct2,ct; //counters
int a=0; //dummy value
FILE *f0=fopen("/","r"); //measuring point
ct2=10;
while (--ct2>0){
ct=5000000;
while (--ct>0){
if (!!a){
printf("%d",a);
}
}
}
FILE *f=fopen("/","r"); //measuring point
ct2=10;
while (--ct2>0){
ct=5000000;
while (--ct>0){
if (a){
printf("%d",a);
}
}
}
FILE *f2=fopen("/","r"); //measuring point
return 0;
}
This code does compile. I then run it through strace (by typing in a terminal: strace -r -ttt ./a.out) and I see:
0.000000 execve("./a.out", ["./a.out"], [/* 47 vars */]) = 0
0.000315 brk(0) = 0x804a000
0.000124 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
0.000144 open("/etc/ld.so.cache", O_RDONLY) = 3
0.000116 fstat64(3, {st_mode=S_IFREG|0644, st_size=139721, ...}) = 0
0.000138 mmap2(NULL, 139721, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ece000
0.000114 close(3) = 0
0.000109 open("/lib/libc.so.6", O_RDONLY) = 3
0.000113 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360d\1"..., 512) = 512
0.000130 fstat64(3, {st_mode=S_IFREG|0755, st_size=1575187, ...}) = 0
0.000131 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ecd000
0.000122 mmap2(NULL, 1357360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7d81000
0.000119 mmap2(0xb7ec7000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x146) = 0xb7ec7000
0.000146 mmap2(0xb7eca000, 9776, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7eca000
0.000139 close(3) = 0
0.000112 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d80000
0.000119 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7d806c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
0.000217 mprotect(0xb7ec7000, 4096, PROT_READ) = 0
0.000108 munmap(0xb7ece000, 139721) = 0
0.000174 brk(0) = 0x804a000
0.000099 brk(0x806b000) = 0x806b000
0.000110 open("/", O_RDONLY) = 3
0.203487 open("/", O_RDONLY) = 4
0.202225 open("/", O_RDONLY) = 5
0.000133 exit_group(0) = ?
I can tell right off at the end that:
0.000110 open("/", O_RDONLY) = 3
0.203487 open("/", O_RDONLY) = 4
0.202225 open("/", O_RDONLY) = 5
return to the three measuring points I set up.
I want to be able to adjust the measuring point lines in my code so that when I run strace I can find my measuring points like I do now, but where the system makes less intensive operations. I don't see anything else from strace related to my program other than the file calls.
I'm thinking maybe if there was such a thing as a built-in MeasureMe function in C that I would use that in place of the measuring point lines in my code, then strace could output:
0.000110 MeasureMe called in code
0.203487 MeasureMe called in code
0.202225 MeasureMe called in code
Is there any way I can go about this with Strace?
The reason why I'm asking about strace instead of gdb is because I use it to debug requests to my apache server like the person in this video does it, and I'll be able to see apache modules in action:
https://www.youtube.com/watch?v=eF-p--AH37E
Any idea how I can solve this? or will I have to continue to make failed attempts at opening non-existing files?

I gather what you are currently using is open("/",O_RDONLY) [or open("/i_do_not_exist",O_RDONLY)] for a "tracepoint". Unfortunately, because you're using strace, you're constrained to using syscalls. But, there is a way to achieve the effect you want.
What you need/want for a tracepoint that you're manually inserting at various points in your source code is:
Any unique syscall that doesn't harm anything
Is easily distinguishable from real code [even code that may return errors such as opening a file or checking for existence with access]
Minimal overhead / fastest execution
Actually, dup on a bad fildes fills the bill nicely:
dup(-10000);
It will return EBADF. It is easily distinguishable as a tracepoint because most real dup calls that are "bad" will be dup(-1)
You can have as many of these as you want. The actual argument becomes the "tracepoint number":
dup(-10001); // tracepoint 1
...
dup(-10002); // tracepoint 2
...
dup(-10003); // tracepoint 3
The output will look like:
0.000044 dup(-10001) = -1 EBADF (Bad file descriptor)
0.000022 dup(-10002) = -1 EBADF (Bad file descriptor)
0.000019 dup(-10003) = -1 EBADF (Bad file descriptor)
I usually encapsulate this in a macro:
#ifdef DEBUG
#define TRACEPOINT(_tno) tracepoint(_tno)
#else
#define TRACEPOINT(_tno) /**/
#endif
void
tracepoint(int tno)
{
dup(-10000 - tno);
}
Then, I add something like:
TRACEPOINT(1); // initialization phase
...
TRACEPOINT(2); // execution phase
...
TRACEPOINT(3); // cleanup/shutdown
Now, I'll write a perl or python script to read in the source files, extracting the comments for the given tracepoints, and append them to the matching lines in the strace output file:
0.000044 TRACEPOINT(1) initialization phase
0.000022 TRACEPOINT(2) execution phase
0.000019 TRACEPOINT(3) cleanup/shutdown
A more sophisticated version of the post-processing script can do all sorts of things:
keep track of timestamps and append a time difference between one tracepoint and the previous one to the trace line
add file name and line number information to the tracepoint lines
keep track of the number of times a given tracepoint is hit [similar to gdb and breakpoints]
generate summary reports relating to tracepoints

Related

fopen() returns NULL but open() syscall returns proper file descriptor?

I've been trying to run this very simple C code on Ubuntu 20.04LTS
#include<stdio.h>
#include<stdlib.h>
int main()
{
FILE *f;
f=fopen("tree.txt","r");
if(f==NULL){
perror("fopen");
exit(1);
}
//readTree();
return 0;
}
But no matter what I've tried so far fopen still returns this eror:
fopen: No such file or directory
[1] + Done
First thing I assumed was that the program didn't have permission to open the file, but the permissions are set correctly:
ls -la tree.txt
-rw-rw-rw- 1 mor mor 7 mar 26 20:43 tree.txt
Next I tried to change the location of the file to /home or to specify a full path instead of the file name; still the same result
Now comes the part I can't wrap my head around, when running the script under Strace, it appears to work fine, at least for now
execve("./B1", ["./B1"], 0x7ffe586e59b0 /* 26 vars */) = 0
brk(NULL) = 0x55cd150ea000
arch_prctl(0x3001 /* ARCH_??? */, 0x7fff084d4590) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=156877, ...}) = 0
mmap(NULL, 156877, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fcadcd70000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360q\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0#\0\0\0\0\0\0\0#\0\0\0\0\0\0\0#\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\t\233\222%\274\260\320\31\331\326\10\204\276X>\263"..., 68, 880) = 68
fstat(3, {st_mode=S_IFREG|0755, st_size=2029224, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcadcd6e000
pread64(3, "\6\0\0\0\4\0\0\0#\0\0\0\0\0\0\0#\0\0\0\0\0\0\0#\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\t\233\222%\274\260\320\31\331\326\10\204\276X>\263"..., 68, 880) = 68
mmap(NULL, 2036952, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fcadcb7c000
mprotect(0x7fcadcba1000, 1847296, PROT_NONE) = 0
mmap(0x7fcadcba1000, 1540096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7fcadcba1000
mmap(0x7fcadcd19000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0x7fcadcd19000
mmap(0x7fcadcd64000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7fcadcd64000
mmap(0x7fcadcd6a000, 13528, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fcadcd6a000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7fcadcd6f540) = 0
mprotect(0x7fcadcd64000, 12288, PROT_READ) = 0
mprotect(0x55cd14c84000, 4096, PROT_READ) = 0
mprotect(0x7fcadcdc4000, 4096, PROT_READ) = 0
munmap(0x7fcadcd70000, 156877) = 0
brk(NULL) = 0x55cd150ea000
brk(0x55cd1510b000) = 0x55cd1510b000
openat(AT_FDCWD, "tree.txt", O_RDONLY) = 3
exit_group(0) = ?
+++ exited with 0 +++
The openat() call returns a small positive integer, which is normal behavior from what I read so far
Lastly, what really grinds my gears is that the output of Strace above differs from the output I received just a couple minutes before. For some reason I can't seem to recreate that output but the gist of it is this:
-openat() returns 3
-lseek(fd, -9, SEEK_CUR) is called and returns -1 ESPIPE Illegal seek please excuse my syntax here, I'm writing it from memory.
Also why is the offset a negative integer? Is that normal?
And why was lseek() called the first couple of times, but not now?
Reading man on lseek here it says that
On Linux, using lseek() on a terminal device fails with the error
ESPIPE.
also the error descriptor
ESPIPE fd is associated with a pipe, socket, or FIFO.
I believe the first quote to be unrelated, plenty of threads online with people managing to use fopen() on linux. When it comes to the error descriptor, it is beyond my level of understanding.
renderer1.log :
[2021-03-27 23:08:46.895] [renderer1] [error] Unexpected: The specified task is missing an execution: Error: Unexpected: The specified task is missing an execution
at S.getTaskExecution (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:90:48749)
at S.$onDidStartTaskProcess (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:90:46905)
at c._doInvokeHandler (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:90:10509)
at c._invokeHandler (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:90:10201)
at c._receiveRequest (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:90:8871)
at c._receiveOneMessage (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:90:7673)
at /snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:90:5782
at g.fire (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:57:1836)
at p.fire (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:65:15443)
at /snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:106:29119
at g.fire (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:57:1836)
at p.fire (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:65:15443)
at t._receiveMessage (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:65:20693)
at /snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:65:17587
at g.fire (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:57:1836)
at l.acceptChunk (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:65:12808)
at /snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:65:12156
at Socket.E (/snap/code/59/usr/share/code/resources/app/out/vs/workbench/services/extensions/node/extensionHostProcess.js:106:12375)
at Socket.emit (events.js:315:20)
at addChunk (_stream_readable.js:295:12)
at readableAddChunk (_stream_readable.js:271:9)
at Socket.Readable.push (_stream_readable.js:212:10)
at Pipe.onStreamRead (internal/stream_base_commons.js:186:23)
This is all I could come up with, apparently it's not good enough.
Any and all help is appreciated.
To sum it up, my issue was that the working directory of the .c program and the directory in which i had saved the .txt file were different. There are many ways to get around this issue.
Try using an absolute path in the fopen directory. For example:
FILE*f; f=fopen("/home/user/Desktop/file.txt","r");
this only works as a test, as it makes the code too rigid(answer provided by #kaylum)
Changing the working directory using chdir()
simple code that changes directory to the one specified by the user, opens file with name and mode specified by user; printing cwd using getcwd()
(part of the answer provided by #ilkkachu)
//puts cwd in *str and prints cwd if successful, otherwise prints error
void getdir(char *str, unsigned int str_size)
{
if(getcwd(str,str_size)==NULL){
perror("getcwd()");
exit(1);
}else
printf("Current working dir: %s\n",str);
}
//reads directory path and sets it as new working dir
int changeDir()
{
char cwd[256];
printf("New woking dir path= ");
scanf("%s",cwd);
if(chdir(cwd)!=0){
perror("chdir()");
return -1;
}
getdir(cwd,sizeof(cwd));
return 0;
}
int main()
{
FILE *f;
char cwd[256],fmod[5];
getdir(cwd,sizeof(cwd));
//reads new path until an existing one is inputted
while(changeDir()!=0);
printf("File name: ");
scanf("%s",cwd);
printf("File mode: ");
scanf("%s",fmod);
f=fopen(cwd,fmod);
if(f==NULL){
perror("fopen");
exit(1);
}
//reads first line of file and prints it to console
fscanf(f,"%[^\n]",cwd);
printf("%s",cwd);
return 0;
}
Code terminal output:
Current working dir: /home/moro/Documents/TP-Lab
New woking dir path= /home/moro/doesNotExist
chdir(): No such file or directory
New woking dir path= /home/moro/Desktop
Current working dir: /home/moro/Desktop
File name: input.txt
File mode: r
THIS IS A TEST FILE[1] + Done
Changing the current working directory in Visual Studio Code settings
Go to
Settings>Terminal>Integrated: Cwd
You can specify a start path there
Terminal › Integrated: Cwd
An explicit start path where the terminal will be launched, this is used as the current working directory (cwd) for the shell process. This may be particularly useful in workspace settings if the root directory is not a convenient cwd.
There probably are smoother ways to go about this, but for my use case these options are more than good enough. If i find the time i will add more solutions to this answer

pipeline command ./a.out | cat, printf("line\n") in a.out is not excuted and no output

A very strange thing, a.out just printf() a line then go into a dead loop, when a.out is executed single, I can see the line in termial, but if pipeline a.out with cat, then we can't see anything.
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv)
{
printf("----------\n");
while (1) {
sleep(1000);
}
return 0;
}
run result
$ cc test.c
$ ./a.out
----------
^C
$ ./a.out | cat
^C
if I strace a.out | cat, write(1) system call is not called
$ strace ./a.out | cat
execve("./a.out", ["./a.out"], 0x7ffdaa23b200 /* 65 vars */) = 0
brk(NULL) = 0x5567446dd000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=94391, ...}) = 0
mmap(NULL, 94391, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f977ba22000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200l\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2000480, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f977ba20000
mmap(NULL, 2008696, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f977b835000
mmap(0x7f977b85a000, 1519616, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f977b85a000
mmap(0x7f977b9cd000, 299008, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x198000) = 0x7f977b9cd000
mmap(0x7f977ba16000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e0000) = 0x7f977ba16000
mmap(0x7f977ba1c000, 13944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f977ba1c000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7f977ba21500) = 0
mprotect(0x7f977ba16000, 12288, PROT_READ) = 0
mprotect(0x556743cd8000, 4096, PROT_READ) = 0
mprotect(0x7f977ba64000, 4096, PROT_READ) = 0
munmap(0x7f977ba22000, 94391) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
brk(NULL) = 0x5567446dd000
brk(0x5567446fe000) = 0x5567446fe000
nanosleep({tv_sec=1000, tv_nsec=0}, ^C{tv_sec=994, tv_nsec=769383373}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
strace: Process 5050 detached
if I strace a.out single, then has write(1)
$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffe09a7c360 /* 65 vars */) = 0
brk(NULL) = 0x564b085a8000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=94391, ...}) = 0
mmap(NULL, 94391, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fac37df5000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200l\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2000480, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fac37df3000
mmap(NULL, 2008696, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac37c08000
mmap(0x7fac37c2d000, 1519616, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7fac37c2d000
mmap(0x7fac37da0000, 299008, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x198000) = 0x7fac37da0000
mmap(0x7fac37de9000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e0000) = 0x7fac37de9000
mmap(0x7fac37def000, 13944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac37def000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7fac37df4500) = 0
mprotect(0x7fac37de9000, 12288, PROT_READ) = 0
mprotect(0x564b076e3000, 4096, PROT_READ) = 0
mprotect(0x7fac37e37000, 4096, PROT_READ) = 0
munmap(0x7fac37df5000, 94391) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
brk(NULL) = 0x564b085a8000
brk(0x564b085c9000) = 0x564b085c9000
write(1, "----------\n", 11----------
) = 11
nanosleep({tv_sec=100
Why ?
An output stream can be in one of three different modes, unbuffered, line buffered, or fully buffered. In unbuffered mode, output is written immediately. In line buffered mode, output is written first to an internal stream buffer until the buffer is full or a newline is written, and then the buffer is flushed to the output. In fully buffered mode, the output is written first to an internal stream buffer until the buffer is full, and then the buffer is flushed to the output. (Some implementations may also flush the output at other times, such as when reading input from an interactive device.)
Input streams can also be in the same three modes, and this determines when read input is made available to the caller of functions that read streams.
The implementation initializes the mode of the standard input (stdin), standard output (stdout), and standard error output (stderr) streams before the main function is called (or at least before any access to the streams). Under certain circumstances, the implementation is allowed to initialize the standard input or standard output to fully buffered mode. The standard input and output streams are initialized to fully buffered mode if and only if the implementation can determine that they are not linked to an interactive device (such as a terminal). (The standard error output stream is never initialized to fully buffered mode.)
Typically, the C runtime library of a POSIX system will call isatty on the underlying file descriptors for the standard input and output streams and set the streams to fully buffered mode if isatty returns 0. This occurs before the main function is called.
When you run "./a.out" with output going to the terminal, the C runtime library determines that output is going to an interactive device and does not set stdout to fully buffered mode. It will be set to one of the other modes, typically line buffered mode. However, when you run "/a.out" with output going to a pipe, the C runtime library determines that output is not going to an interactive device and does set stdout to fully buffered mode. This is why the output is not written to the pipe immediately.
There are two ways to solve your problem. The first is to change the standard output stream to line buffered mode or unbuffered mode before the first call to printf:
setvbuf(stdout, NULL, _IOLBF, 0); // set standard output to line buffered mode
setvbuf(stdout, NULL, _IONBF, 0); // set standard output to unbuffered mode
The other way is to to flush the standard output on demand:
fflush(stdout); // write buffered standard output contents

Why no call fstab64 after read? And can this be a problem?

I want to create a new dynamic library instead of another, the source code of which is lost. I have created a library with exported functions, but the program does not load it. Conclusion Strace is almost the same, the only difference is that in the case of loading my library after the call to read() there is no call to fstat64().
strace original library:
open("/usr/local/lpr/li2/libSA.so", O_RDONLY) = 12
read(12, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\3409\0"..., 1024) = 1024
fstat64(12, {st_mode=S_IFREG|0644, st_size=46166, ...}) = 0
old_mmap(NULL, 40256, PROT_READ|PROT_EXEC, MAP_PRIVATE, 12, 0) = 0x40150000
mprotect(0x40159000, 3392, PROT_NONE) = 0
old_mmap(0x40159000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 12, 0x8000) = 0x40159000
close(12) = 0
my library:
open("/usr/local/lpr/li2/libSA.so", O_RDONLY) = 12
read(12, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\210\0\0"..., 1024) = 1024
close(12) = 0
time(NULL)
You're trying to load a 64-bit shared object into a 32-bit process.
The ELF header read by these two read() calls:
read(12, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\3409\0"..., 1024) = 1024
and
read(12, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\210\0\0"..., 1024) = 1024
differ. Note that the fifth byte in the first read() is 1. That's the successful load of a 32-bit shared object.
That fifth byte is 2 on the unsuccessful attempt - and that 2 means that the shared object is a 64-bit shared object.
You likely need to compile and link with the -m32 option.

How can I attached a BPF program to a kernel function via a kprobe?

The Cilium BPF and XDP Reference Guide describes how you can load a BPF program to a netdevice via the ip and tc commands. How would I attach a BPF program to a kernel function/userspace function in the same way?
TL;DR You can use the traditional kprobe API to trace a function, then perf_event_open + ioctl to attach a BPF program.
This is implemented in the load_and_attach function of file load_bpf.c in the kernel, and in the bpf_attach_kprobe and bpf_attach_tracing_event function of file libbpf.c in bcc.
You can see this in action when tracing the hello_world.py from bcc:
$ strace -s 100 python examples/hello_world.py
[...]
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=15, insns=0x7f35716217d0, license="GPL", log_level=0, log_size=0, log_buf=0, kern_version=265728}, 72) = 3
openat(AT_FDCWD, "/sys/bus/event_source/devices/kprobe/type", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/event_source/devices/kprobe/format/retprobe", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/kernel/debug/tracing/kprobe_events", O_WRONLY|O_APPEND) = 4
getpid() = 8121
write(4, "p:kprobes/p_sys_clone_bcc_8121 sys_clone", 40) = 40
close(4) = 0
openat(AT_FDCWD, "/sys/kernel/debug/tracing/events/kprobes/p_sys_clone_bcc_8121/id", O_RDONLY) = 4
read(4, "1846\n", 4096) = 5
close(4) = 0
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=0 /* PERF_ATTR_SIZE_??? */, config=1846, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = 4
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x7f356c58b000
ioctl(4, PERF_EVENT_IOC_SET_BPF, 0x3) = 0
ioctl(4, PERF_EVENT_IOC_ENABLE, 0) = 0
openat(AT_FDCWD, "/sys/kernel/debug/tracing/trace_pipe", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(5,
The first syscall (bpf) loads the BPF program in the kernel.
Then bcc follows the kprobe API to trace sys_clone by writing p:kprobes/p_sys_clone_bcc_8121 sys_clone in p:kprobes/p_sys_clone_bcc_8121 sys_clone.
bcc retrieves, in p:kprobes/p_sys_clone_bcc_8121 sys_clone, an ID to use in perf_event_open.
bcc calls perf_event_open with type PERF_TYPE_TRACEPOINT
and attaches the loaded BPF program (referenced by fd 0x3) to that perf_event, with an PERF_EVENT_IOC_SET_BPF ioctl.

seccomp --- how to EXIT_SUCCESS?

Ηow to EXIT_SUCCESS after strict mode seccomp is set. Is it the correct practice, to call syscall(SYS_exit, EXIT_SUCCESS); at the end of main?
#include <stdlib.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <sys/syscall.h>
int main(int argc, char **argv) {
prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);
//return EXIT_SUCCESS; // does not work
//_exit(EXIT_SUCCESS); // does not work
// syscall(__NR_exit, EXIT_SUCCESS); // (EDIT) This works! Is this the ultimate answer and the right way to exit success from seccomp-ed programs?
syscall(SYS_exit, EXIT_SUCCESS); // (EDIT) works; SYS_exit equals __NR_exit
}
// gcc seccomp.c -o seccomp && ./seccomp; echo "${?}" # I want 0
As explained in eigenstate.org and in SECCOMP (2):
The only system calls that the calling thread is permitted to
make are read(2), write(2), _exit(2) (but not exit_group(2)),
and sigreturn(2). Other system calls result in the delivery
of a SIGKILL signal.
As a result, one would expect _exit() to work, but it's a wrapper function that invokes exit_group(2) which is not allowed in strict mode ([1], [2]), thus the process gets killed.
It's even reported in exit(2) - Linux man page:
In glibc up to version 2.3, the _exit() wrapper function invoked the kernel system call of the same name. Since glibc 2.3, the wrapper function invokes exit_group(2), in order to terminate all of the threads in a process.
Same happens with the return statement, which should end up in killing your process, in the very similar manner with _exit().
Stracing the process will provide further confirmation (to allow this to show up, you have to not set PR_SET_SECCOMP; just comment prctl()) and I got similar output for both non-working cases:
linux12:/home/users/grad1459>gcc seccomp.c -o seccomp
linux12:/home/users/grad1459>strace ./seccomp
execve("./seccomp", ["./seccomp"], [/* 24 vars */]) = 0
brk(0) = 0x8784000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb775f000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=97472, ...}) = 0
mmap2(NULL, 97472, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7747000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220\226\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1730024, ...}) = 0
mmap2(NULL, 1739484, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xdd0000
mmap2(0xf73000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xf73000
mmap2(0xf76000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf76000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7746000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7746900, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xf73000, 8192, PROT_READ) = 0
mprotect(0x8049000, 4096, PROT_READ) = 0
mprotect(0x16e000, 4096, PROT_READ) = 0
munmap(0xb7747000, 97472) = 0
exit_group(0) = ?
linux12:/home/users/grad1459>
As you can see, exit_group() is called, explaining everything!
Now as you correctly stated, "SYS_exit equals __NR_exit"; for example it's defined in mit.syscall.h:
#define SYS_exit __NR_exit
so the last two calls are equivalent, i.e. you can use the one you like, and the output should be this:
linux12:/home/users/grad1459>gcc seccomp.c -o seccomp && ./seccomp ; echo "${?}"
0
PS
You could of course define a filter yourself and use:
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, filter);
as explained in the eigenstate link, to allow _exit() (or, strictly speaking, exit_group(2)), but do that only if you really need to and know what you are doing.
The problem occurs, because the GNU C library uses the exit_group syscall, if it is available, in Linux instead of exit, for the _exit() function (see sysdeps/unix/sysv/linux/_exit.c for verification), and as documented in the man 2 prctl, the exit_group syscall is not allowed by the strict seccomp filter.
Because the _exit() function call occurs inside the C library, we cannot interpose it with our own version (that would just do the exit syscall). (The normal process cleanup is done elsewhere; in Linux, the _exit() function only does the final syscall that terminates the process.)
We could ask the GNU C library developers to use the exit_group syscall in Linux only when there are more than one thread in the current process, but unfortunately, it would not be easy, and even if added right now, would take quite some time for the feature to be available on most Linux distributions.
Fortunately, we can ditch the default strict filter, and instead define our own. There is a small difference in behaviour: the apparent signal that kills the process will change from SIGKILL to SIGSYS. (The signal is not actually delivered, as the kernel does kill the process; only the apparent signal number that caused the process to die changes.)
Furthermore, this is not even that difficult. I did waste a bit of time looking into some GCC macro trickery that would make it trivial to manage the allowed syscalls' list, but I decided it would not be a good approach: the list of allowed syscalls should be carefully considered -- we only add exit_group() compared to the strict filter, here! -- so making it a bit difficult is okay.
The following code, say example.c, has been verified to work on a 4.4 kernel (should work on kernels 3.5 or later) on x86-64 (for both x86 and x86-64, i.e. 32-bit and 64-bit binaries). It should work on all Linux architectures, however, and it does not require or use the libseccomp library.
#define _GNU_SOURCE
#include <stdlib.h>
#include <stddef.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <stdio.h>
static const struct sock_filter strict_filter[] = {
BPF_STMT(BPF_LD | BPF_W | BPF_ABS, (offsetof (struct seccomp_data, nr))),
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_rt_sigreturn, 5, 0),
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_read, 4, 0),
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_write, 3, 0),
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_exit, 2, 0),
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_exit_group, 1, 0),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)
};
static const struct sock_fprog strict = {
.len = (unsigned short)( sizeof strict_filter / sizeof strict_filter[0] ),
.filter = (struct sock_filter *)strict_filter
};
int main(void)
{
/* To be able to set a custom filter, we need to set the "no new privs" flag.
The Documentation/prctl/no_new_privs.txt file in the Linux kernel
recommends this exact form: */
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
fprintf(stderr, "Cannot set no_new_privs: %m.\n");
return EXIT_FAILURE;
}
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &strict)) {
fprintf(stderr, "Cannot install seccomp filter: %m.\n");
return EXIT_FAILURE;
}
/* The seccomp filter is now active.
It differs from SECCOMP_SET_MODE_STRICT in two ways:
1. exit_group syscall is allowed; it just terminates the
process
2. Parent/reaper sees SIGSYS as the killing signal instead of
SIGKILL, if the process tries to do a syscall not in the
explicitly allowed list
*/
return EXIT_SUCCESS;
}
Compile using e.g.
gcc -Wall -O2 example.c -o example
and run using
./example
or under strace to see the syscalls and library calls done;
strace ./example
The strict_filter BPF program is really trivial. The first opcode loads the syscall number into the accumulator. The next five opcodes compare it to an acceptable syscall number, and if found, jump to the final opcode that allows the syscall. Otherwise the second-to-last opcode kills the process.
Note that although the documentation refers to sigreturn being the allowed syscall, the actual name of the syscall in Linux is rt_sigreturn. (sigreturn was deprecated in favour of rt_sigreturn ages ago.)
Furthermore, when the filter is installed, the opcodes are copied to kernel memory (see kernel/seccomp.c in the Linux kernel sources), so it does not affect the filter in any way if the data is modified later. Having the structures static const has zero security impact, in other words.
I used static since there is no need for the symbols to be visible outside this compilation unit (or in a stripped binary), and const to put the data into the read-only data section of the ELF binary.
The form of a BPF_JUMP(BPF_JMP | BPF_JEQ, nr, equals, differs) is simple: the accumulator (the syscall number) is compared to nr. If they are equal, then the next equals opcodes are skipped. Otherwise, the next differs opcodes are skipped.
Since the equals cases jump to the very final opcode, you can add new opcodes at the top (that is, just after the initial opcode), incrementing the equals skip count for each one.
Note that printf() will not work after the seccomp filter is installed, because internally, the C library wants to do a fstat syscall (on standard output), and a brk syscall to allocate some memory for a buffer.

Resources