I was going through an article here and was trying out the code snippet I have copied out below :-
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h> /* For constants
ORIG_EAX etc */
int main()
{ pid_t child;
long orig_eax;
child = fork();
if(child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl("/bin/ls", "ls", NULL);
}
else {
wait(NULL);
orig_eax = ptrace(PTRACE_PEEKUSER,
child, 4 * ORIG_EAX,
NULL);
printf("The child made a "
"system call %ld\n", orig_eax);
ptrace(PTRACE_CONT, child, NULL, NULL);
}
return 0;
}
I have a doubt regarding what ORIG_EAX is exactly and why 4*ORIG_EAX is passed onto the ptrace call. I initially assumed that ORIG_EAX, EBX, ECX etc would be the offsets into a particular structure where the values of the registers would be stored.
So I decided to print the value of ORIG_EAX just after the wait by using printf("origeax = %ld\n", ORIG_EAX);. The value was 11. So, my earlier assumption regarding the offsets was wrong.
I understand that the wait call is terminated when the child has a state change(in this case, issues a system call) and that ORIG_EAX would contain the system call number.
However, why is ORIG_EAX * 4 passed onto the ptrace call?
The parameter is an offset into the user_regs_struct. Note that each of these is an unsigned long, so to get the 11th entry (orig_eax) the offset in bytes is 44, (provided you're on an x86 machine of course).
Related
I am currently recode the Strace command.
I understand the goal of this command and I can catch some syscalls from an exectuable file.
My question is : Why I don't catch the "write" syscall ?
this is my code :
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/user.h>
#include <wait.h>
int main(int argc, char* argv[]) {
int status;
pid_t pid;
struct user_regs_struct regs;
int counter = 0;
int in_call =0;
switch(pid = fork()) {
case -1:
perror("fork");
exit(1);
case 0:
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execvp(argv[1], argv + 1);
break;
default:
wait(&status);
while (status == 1407) {
ptrace(PTRACE_GETREGS, pid, NULL, ®s);
if(!in_call) {
printf("SystemCall %lld called with %lld, %lld, %lld\n",regs.orig_rax,
regs.rbx, regs.rcx, regs.rdx);
in_call=1;
counter ++;
}
else
in_call = 0;
ptrace(PTRACE_SYSEMU, pid, NULL, NULL);
wait(&status);
}
}
printf("Total Number of System Calls = %d\n", counter);
return 0;
}
This is the output using my program :
./strace ./my_program
SystemCall 59 called with 0, 0, 0
SystemCall 60 called with 0, 4198437, 5
Total Number of System Calls = 2
59 represents the execve syscall.
60 represents the exit syscall.
This is the output using the real strace :
strace ./my_program
execve("./my_program", ["./bin_asm_write"], 0x7ffd2929ae70 /* 67 vars */) = 0
write(1, "Toto\n", 5Toto
) = 5
exit(0) = ?
+++ exited with 0 +++
As you can see, my program don't catch the write syscall.
I don't understrand why, do you have any idea ?
Thank You for your answer.
Your while loop is set up rather strangely -- you have this in_call flag that you toggle back and forth between 0 and 1, and you only print the system call when it is 0. The net result is that while you catch every system call, you only print every other system call. So when you catch the write call, the flag is 1 and you don't print anything.
Another oddness is that you're using PTRACE_SYSEMU rather than PTRACE_SYSCALL. SYSEMU is intended for emulating system calls, so the system call won't actually run at all (it will be skipped); normally your ptracing program would do whatever the systme call is supposed to do itself and then call PTRACE_SETREGS to set the tracee's registers with the appropriate return values before calling PTRACE_SYSEMU again to run to the next system call.
Your in_call flagging would make more sense if you were actually using PTRACE_SYSCALL, as that will stop twice for each syscall -- once on entry to the syscall and a second time when the call returns. However, it will also stop for signals, so you need to be decoding the status to see if a signal has occurred or not.
This is fairly simple application which creates a lightweight process (thread) with clone() call.
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>
#define STACK_SIZE 1024*1024
int func(void* param) {
printf("I am func, pid %d\n", getpid());
return 0;
}
int main(int argc, char const *argv[]) {
printf("I am main, pid %d\n", getpid());
void* ptr = malloc(STACK_SIZE);
printf("I am calling clone\n");
int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
// works fine with sleep() call
// sleep(1);
if (res == -1) {
printf("clone error: %d", errno);
} else {
printf("I created child with pid: %d\n", res);
}
printf("Main done, pid %d\n", getpid());
return 0;
}
Here are results:
Run 1:
➜ LFD401 ./clone
I am main, pid 10974
I am calling clone
I created child with pid: 10975
Main done, pid 10974
I am func, pid 10975
Run 2:
➜ LFD401 ./clone
I am main, pid 10995
I am calling clone
I created child with pid: 10996
I created child with pid: 10996
I am func, pid 10996
Main done, pid 10995
Run 3:
➜ LFD401 ./clone
I am main, pid 11037
I am calling clone
I created child with pid: 11038
I created child with pid: 11038
I am func, pid 11038
I created child with pid: 11038
I am func, pid 11038
Main done, pid 11037
Run 4:
➜ LFD401 ./clone
I am main, pid 11062
I am calling clone
I created child with pid: 11063
Main done, pid 11062
Main done, pid 11062
I am func, pid 11063
What is going on here? Why "I created child" message is sometimes printed several times?
Also I noticed that adding a delay after clone call "fixes" the problem.
You have a race condition (i.e.) you don't have the implied thread safety of stdio.
The problem is even more severe. You can get duplicate "func" messages.
The problem is that using clone does not have the same guarantees as pthread_create. (i.e.) You do not get the thread safe variants of printf.
I don't know for sure, but, IMO the verbiage about stdio streams and thread safety, in practice, only applies when using pthreads.
So, you'll have to handle your own interthread locking.
Here is a version of your program recoded to use pthread_create. It seems to work without incident:
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
#define STACK_SIZE 1024*1024
void *func(void* param) {
printf("I am func, pid %d\n", getpid());
return (void *) 0;
}
int main(int argc, char const *argv[]) {
printf("I am main, pid %d\n", getpid());
void* ptr = malloc(STACK_SIZE);
printf("I am calling clone\n");
pthread_t tid;
pthread_create(&tid,NULL,func,NULL);
//int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
int res = 0;
// works fine with sleep() call
// sleep(1);
if (res == -1) {
printf("clone error: %d", errno);
} else {
printf("I created child with pid: %d\n", res);
}
pthread_join(tid,NULL);
printf("Main done, pid %d\n", getpid());
return 0;
}
Here is a test script I've been using to check for errors [it's a little rough, but should be okay]. Run against your version and it will abort quickly. The pthread_create version seems to pass just fine
#!/usr/bin/perl
# clonetest -- clone test
#
# arguments:
# "-p0" -- suppress check for duplicate parent messages
# "-c0" -- suppress check for duplicate child messages
# 1 -- base name for program to test (e.g. for xyz.c, use xyz)
# 2 -- [optional] number of test iterations (DEFAULT: 100000)
master(#ARGV);
exit(0);
# master -- master control
sub master
{
my(#argv) = #_;
my($arg,$sym);
while (1) {
$arg = $argv[0];
last unless (defined($arg));
last unless ($arg =~ s/^-(.)//);
$sym = $1;
shift(#argv);
$arg = 1
if ($arg eq "");
$arg += 0;
${"opt_$sym"} = $arg;
}
$opt_p //= 1;
$opt_c //= 1;
printf("clonetest: p=%d c=%d\n",$opt_p,$opt_c);
$xfile = shift(#argv);
$xfile //= "clone1";
printf("clonetest: xfile='%s'\n",$xfile);
$itermax = shift(#argv);
$itermax //= 100000;
$itermax += 0;
printf("clonetest: itermax=%d\n",$itermax);
system("cc -o $xfile -O2 $xfile.c -lpthread");
$code = $? >> 8;
die("master: compile error\n")
if ($code);
$logf = "/tmp/log";
for ($iter = 1; $iter <= $itermax; ++$iter) {
printf("iter: %d\n",$iter)
if ($opt_v);
dotest($iter);
}
}
# dotest -- perform single test
sub dotest
{
my($iter) = #_;
my($parcnt,$cldcnt);
my($xfsrc,$bf);
system("./$xfile > $logf");
open($xfsrc,"<$logf") or
die("dotest: unable to open '$logf' -- $!\n");
while ($bf = <$xfsrc>) {
chomp($bf);
if ($opt_p) {
while ($bf =~ /created/g) {
++$parcnt;
}
}
if ($opt_c) {
while ($bf =~ /func/g) {
++$cldcnt;
}
}
}
close($xfsrc);
if (($parcnt > 1) or ($cldcnt > 1)) {
printf("dotest: fail on %d -- parcnt=%d cldcnt=%d\n",
$iter,$parcnt,$cldcnt);
system("cat $logf");
exit(1);
}
}
UPDATE:
Were you able to recreate OPs problem with clone?
Absolutely. Before I created the pthreads version, in addition to testing OP's original version, I also created versions that:
(1) added setlinebuf to the start of main
(2) added fflush just before the clone and __fpurge as the first statement of func
(3) added an fflush in func before the return 0
Version (2) eliminated the duplicate parent messages, but the duplicate child messages remained
If you'd like to see this for yourself, download OP's version from the question, my version, and the test script. Then, run the test script on OP's version.
I posted enough information and files so that anyone can recreate the problem.
Note that due to differences between my system and OP's, I couldn't at first reproduce the problem on just 3-4 tries. So, that's why I created the script.
The script does 100,000 test runs and usually the problem will manifest itself within 5000-15000.
I can't recreate OP's issue, but I don't think the printf's are actually a problem.
glibc docs:
The POSIX standard requires that by default the stream operations are
atomic. I.e., issuing two stream operations for the same stream in two
threads at the same time will cause the operations to be executed as
if they were issued sequentially. The buffer operations performed
while reading or writing are protected from other uses of the same
stream. To do this each stream has an internal lock object which has
to be (implicitly) acquired before any work can be done.
Edit:
Even though the above is true for threads, as rici points out, there is a comment on sourceware:
Basically, there's nothing you can safely do with CLONE_VM unless the
child restricts itself to pure computation and direct syscalls (via
sys/syscall.h). If you use any of the standard library, you risk the
parent and child clobbering each other's internal states. You also
have issues like the fact that glibc caches the pid/tid in userspace,
and the fact that glibc expects to always have a valid thread pointer
which your call to clone is unable to initialize correctly because it
does not know (and should not know) the internal implementation of
threads.
Apparently, glibc isn't designed to work with clone if CLONE_VM is set but CLONE_THREAD|CLONE_SIGHAND are not.
Your processes both use the same stdout (that is, the C standard library FILE struct), which includes an accidentally shared buffer. That's undoubtedly causing problems.
Ass everyone suggests: it really seems to be a problem with, how shall I put it in case of clone(), process-safety? With a rough sketch of a locking version of printf (using write(2)) the output is as expected.
#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>
#define STACK_SIZE 1024*1024
// VERY rough attempt at a thread-safe printf
#include <stdarg.h>
#define SYNC_REALLOC_GROW 64
int sync_printf(const char *format, ...)
{
int n, all = 0;
int size = 256;
char *p, *np;
va_list args;
if ((p = malloc(size)) == NULL)
return -1;
for (;;) {
va_start(args, format);
n = vsnprintf(p, size, format, args);
va_end(args);
if (n < 0)
return -1;
all += n;
if (n < size)
break;
size = n + SYNC_REALLOC_GROW;
if ((np = realloc(p, size)) == NULL) {
free(p);
return -1;
} else {
p = np;
}
}
// write(2) shoudl be threadsafe, so just in case
flockfile(stdout);
n = (int) write(fileno(stdout), p, all);
fflush(stdout);
funlockfile(stdout);
va_end(args);
free(p);
return n;
}
int func(void *param)
{
sync_printf("I am func, pid %d\n", getpid());
return 0;
}
int main()
{
sync_printf("I am main, pid %d\n", getpid());
void *ptr = malloc(STACK_SIZE);
sync_printf("I am calling clone\n");
int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
// works fine with sleep() call
// sleep(1);
if (res == -1) {
sync_printf("clone error: %d", errno);
} else {
sync_printf("I created child with pid: %d\n", res);
}
sync_printf("Main done, pid %d\n\n", getpid());
return 0;
}
For the third time: it's only a sketch, no time for a robust version, but that shouldn't hinder you to write one.
As evaitl points out printf is documented to be thread-safe by glibc's documentation. BUT, this typically assumes that you are using the designated glibc function to create threads (that is, pthread_create()). If you do not, then you are on your own.
The lock taken by printf() is recursive (see flockfile). This means that if the lock is already taken, the implementation checks the owner of the lock against the locker. If the locker is the same as the owner, the locking attempt succeeds.
To distinguish between different threads, you need to setup properly TLS, which you do not do, but pthread_create() does. What I'm guessing happens is that in your case the TLS variable that identifies the thread is the same for both threads, so you end up taking the lock.
TL;DR: please use pthread_create()
I can't figure out why the function returns an "No such process" error message every time I run it, but simply using the same instructions inline produces the required output.
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
void getregs(pid_t proc, struct user_regs_struct *regs);
int main() {
pid_t proc = fork();
if(proc == 0) {
if(ptrace(PTRACE_TRACEME, 0, NULL, NULL) == -1) {
perror("traceme");
exit(0);
}
if(execl("child", "child", NULL) == -1) {
perror("execl");
exit(0);
}
} else {
wait(&proc);
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, proc, NULL, ®s);
printf("eax: %08x\n", (unsigned int)regs.eax);
getregs(proc, ®s);
ptrace(PTRACE_CONT, proc, NULL, NULL);
}
return 0;
}
void getregs(pid_t proc, struct user_regs_struct *regs) {
if(ptrace(PTRACE_GETREGS, proc, NULL, ®s) == -1) {
perror("GETREGS");
exit(1);
}
printf("eax: %08x\n", (unsigned int)regs->eax);
}
When I run this I get
~$ ./tracer
eax: 0000002f
GETREGS: No such process
I don't get why getregs() returns that error. It's almost like it is outside scope of something?
Also, something a little unrelated: EAX is always set to 0000002f no matter what process I try to execl(). Is this natural? I don't know if i'm forking the child process properly or not. Would I need to make a new question on SO for this?
You hit this error because you are modifying the value of the process identifier (PID) contained in the variable proc by passing its address to the wait(2) syscall.
The wait syscall will change the value of proc with the return status of your child process upon its termination. So when you reference your child process in ptrace using proc, its value will likely be invalid and referencing no existing processes.
And as #lornix noticed, make sure that you pass the right pointer to ptrace in the getregs function.
void getregs(pid_t proc, struct user_regs_struct *regs) {
if(ptrace(PTRACE_GETREGS, proc, NULL, ®s) == -1) {
You need to dereference regs in the ptrace call. (remove & in this case)
if(ptrace(PTRACE_GETREGS, proc, NULL, regs) == -1) {
you're calling getregs with the ADDRESS of regs, so getregs' regs is not a structure like in the main code, it's a pointer to a structure.
EDIT: figured it out
You're using/reassigning proc in the wait call, shouldn't do that. The parameter to wait is a status value, not the pid of a particular child. Wait waits for any child, see waitpid for a pid specific wait.
Try:
int wait_status;
wait(&wait_status);
in place of the current wait function call.
Both your ptrace calls are behaving the same way. The difference is that you're ignoring the return value of the inline one, whereas the one in the function is checked.
The EAX value is a red herring: the structure is not initialized because the PTRACE_GETREGS failed.
The wait function does not take a process ID. It waits for some process to terminate and puts its status into the integer value that is passed in by pointer.
You want waitpid (if you want to wait for a specific child process). The simple function wait is useful when you know there is only one:
int status;
if (wait(&status)) { ... }
I'm following the tutorial here, and modified a little for x86-64(basically replace eax to rax,etc) so that it compiles:
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <unistd.h>
int main()
{ pid_t child;
long orig_eax;
child = fork();
if(child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl("/bin/ls", "ls", NULL);
}
else {
wait(NULL);
orig_eax = ptrace(PTRACE_PEEKUSER,
child, 4 * ORIG_RAX,
NULL);
printf("The child made a "
"system call %ld\n", orig_eax);
ptrace(PTRACE_CONT, child, NULL, NULL);
}
return 0;
}
But it doesn't actually work as expected, it always says:
The child made a system call -1
What's wrong in the code?
ptrace returns -1 with errno EIO because what you're trying to read is not correctly aligned. Taken from ptrace manpage:
PTRACE_PEEKUSER
Reads a word at offset addr in the child's USER area, which
holds the registers and other information about the process (see
<sys/user.h>). The word is returned as the result of the
ptrace() call. Typically the offset must be word-aligned,
though this might vary by architecture. See NOTES. (data is
ignored.)
In my 64-bits system, 4 * ORIG_RAX is not 8-byte-aligned. Try with values such 0 or 8 and it should work.
In 64 bit = 8 * ORIG_RAX
8 = sizeof(long)
I would like to create a file whose descriptor would have some customizable behavior. In particular, I'd like to create a file descriptor, which, when written to, would prefix every line, with name of the process and pid (and maybe time), but I can imagine it can be useful to do other things.
I don't want to alter the writing program - for one thing, I want it to work for all programs on my system, even shell/perl/etc. scripts, and it would be impractical if not impossible to change the source code of everything.
Note that pipes wouldn't do in this case, because when the writing process fork()s, the newly created child shares the fd and is indistinguishable from its parent by the reading end of the pipe.
There are approaches which would do, but I think they are rather clumsy:
Create a kernel module that will create such fds. For example, you could open some /dev/customfd and then instruct the module to do some transformation etc. or send data to userspace or socket etc.
Use LD_PRELOAD that will override the fd manipulation functions and do these kinds of things on the "special" fd.
However, both of these approaches are quite laborious, so I would like to know if there is a better way, or any infrastructure (like off-the-shelf libraries) that would help.
I'd prefer a solution which doesn't involve kernel changes, but I'm ready to accept them if necessary.
Just an idea: would FUSE be an answer?
You have a lot of options , as you mentioned using the LD_PRELOAD wrapping the write()/read() functions is a good approach.
I recommend you to use unix ptrace(2) to caught the desired system call and pass the arguments to your own function.
Example :
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>
#include <sys/syscall.h> /* For SYS_write etc */
int main()
{ pid_t child;
long orig_eax, eax;
long params[3];
int status;
int insyscall = 0;
child = fork();
if(child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
execl("/bin/ls", "ls", NULL);
}
else {
while(1) {
wait(&status);
if(WIFEXITED(status))
break;
orig_eax = ptrace(PTRACE_PEEKUSER,
child, 4 * ORIG_EAX, NULL);
if(orig_eax == SYS_write) {
if(insyscall == 0) {
/* Syscall entry */
insyscall = 1;
params[0] = ptrace(PTRACE_PEEKUSER,
child, 4 * EBX,
NULL);
params[1] = ptrace(PTRACE_PEEKUSER,
child, 4 * ECX,
NULL);
params[2] = ptrace(PTRACE_PEEKUSER,
child, 4 * EDX,
NULL);
printf("Write called with "
"%ld, %ld, %ld\n",
params[0], params[1],
params[2]);
}
else { /* Syscall exit */
eax = ptrace(PTRACE_PEEKUSER,
child, 4 * EAX, NULL);
printf("Write returned "
"with %ld\n", eax);
insyscall = 0;
}
}
ptrace(PTRACE_SYSCALL,
child, NULL, NULL);
}
}
return 0;
}