After reading on the web that i can't really determine which process runs before i.e child or parent I planned to disable the ASLR on my PC and run debugger to see if i can generate the pattern of execution, the observations I made are below with a GitGist to the GDB disas (full) along with the source code
#include<stdio.h>
#include<sys/types.h>
#include<unistd.h>
int main()
{
fork();
//fork();
printf("LINUX\n");
//printf("my pid is %d",(int) getpid());
fork();
printf("REDHAT\n");
//printf("my pid is %d",(int) getpid());
//fork();
return 0;
}
this is the code i am talking about when i disas it in gdb it gave me:-
gdb-peda$ disas main
Dump of assembler code for function main:
0x000000000000068a <+0>: push rbp
0x000000000000068b <+1>: mov rbp,rsp
0x000000000000068e <+4>: call 0x560 <fork#plt>
0x0000000000000693 <+9>: lea rdi,[rip+0xaa] # 0x744
0x000000000000069a <+16>: call 0x550 <puts#plt>
0x000000000000069f <+21>: call 0x560 <fork#plt>
0x00000000000006a4 <+26>: lea rdi,[rip+0x9f] # 0x74a
0x00000000000006ab <+33>: call 0x550 <puts#plt>
0x00000000000006b0 <+38>: mov eax,0x0
0x00000000000006b5 <+43>: pop rbp
0x00000000000006b6 <+44>: ret
End of assembler dump.
so basically it gives me a set pattern of the execution, so i think that should mean the program should execute always in a particular order i tried disas main about 3 times to see if the order actually ever changes and it does not but when i finally run the binary that's generated it gives me different outputs
root#localhost:~/os/fork analysis# ./forkagain
LINUX
REDHAT
LINUX
REDHAT
REDHAT
REDHAT
root#localhost:~/os/fork analysis# ./forkagain
LINUX
LINUX
REDHAT
REDHAT
REDHAT
REDHAT
which is inconsistent to the observation that i made in the disas, can someone please fill up the gaps in my understanding?
Fork Analysis Full
i tried disas main about 3 times to see if the order actually ever changes
The order is fixed at compile time, so it can never change without you recompiling the program.
In addition, the order is fixed by your program source -- the compiler is not allowed to re-order your output.
What you observe then is the indeterminism introduced by the OS as a result of calling fork -- after the fork there are no guarantees which process will run first, or for how long. The parent may run to completion, then the child. Or the child may run to completion first. Or they may both run with time-slicing, say one line at a time.
In addition, most non-ancient Linux systems today are running on multi-processor machines, and the two independent processes can run simultaneously after the fork.
An additional complication is that your program is not well-defined, because of stdio buffering. While you see 6 lines of output, it might be hard for you to explain this result:
./forkagain | wc -l
8
./forkagain > junk.out; cat junk.out
LINUX
REDHAT
LINUX
REDHAT
LINUX
REDHAT
LINUX
REDHAT
You should add fflush(stdout); before fork to avoid this complication.
P.S. You should also un-learn the bad habit of running as root -- sooner or later you'll make a stupid mistake (like typing rm -rf * in the wrong directory), and will be really sorry you did it as root.
Each process executes in perfectly defined order. The trick here is that there is no guarantee of that each process will execute in one tick (the piece of time the process occupies the execution unit) and there is no guarantee of that two processes forked from the same process will get their tick in the order of forking.
If we assume that the printing of A (LINUX) and B (REDHAT) is the benchmark then you can get any sequence of As and Bs given that:
sequence begins with A
there are total of two As and four Bs
there is are two Bs after each A
AABBBB
ABABBB
ABBABB
are all possible outputs on a preemptive multitasking OS.
P.S. And this answer is not complete without what Employed says.
Related
I want to count the total number of instructions executed when running /bin/ls.
I used 3 methods whose results differ heavily and i dont have a clue why.
1. Instruction counting with ptrace
I wrote a piece of code that invokes an instance of ls and singlesteps through it with ptrace:
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <sys/syscall.h>
int main()
{
pid_t child;
child = fork(); //create child
if(child == 0) {
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
char* child_argv[] = {"/bin/ls", NULL};
execv("/bin/ls", child_argv);
}
else {
int status;
long long ins_count = 0;
while(1)
{
//stop tracing if child terminated successfully
wait(&status);
if(WIFEXITED(status))
break;
ins_count++;
ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
}
printf("\n%lld Instructions executed.\n", ins_count);
}
return 0;
}
Running this code gives me 516.678 Instructions executed.
2. QEMU singlestepping
I simulated ls using qemu in singlestep mode and logged all incoming instructions into a log file using the following command:
qemu-x86_64 -singlestep -D logfile -d in_asm /bin/ls
According to qemu ls executes 16.836 instructions.
3. perf
sudo perf stat ls
This command gave me 8.162.180 instructions executed.
I know that most of these instructions come from the dynamic linker and it is fine that they get counted. But why do these numbers differ so much? Shouldn't they all be the same?
Your counting instruction number method with qemu was wrong,the in_asm option only show the translated instruction in a compiled block, so after tb chaining process in qemu, it would dirctly jump to the translated block,leading the count in qemu was less than other tools, so a good way in practice is -d nochain,exec with -singlestep options.
Still, there also have instruction number differce between these tools, i have tried qemu running in different dirctory to produce those logs, the qemu guest program was statically linked, the logs file show different results in counting instruction number, it may be some glibc start or init stuff get involved with environment arguments to cause this differnce.
Why do these instruction counts differ so much? Because they really measure different things, and only the unit of measure is the same. It's as if you were weighing something you brought from the store, and one person weighed everything without packages nor even stickers on it, another was weighing it in packages and included the shopping bags too, and yet another also added the mud you brought into the house on your boots.
That's pretty much what is happening here: the instruction counts are not the instruction counts only of what's inside the ls binary, but can also include the libraries it uses, the services of the kernel loader needed to bring those libraries in, and finally the code executed in the process but in the kernel context. The methods you used all behave differently in that respect. So the question is: what do you need out of that measurement? If you need the "total effort", then certainly the largest number is what you want: this will include some of the overhead caused by the kernel. If you need the "I just want to know what happened in ls", then the smallest number is the one you want.
Your program using PTRACE_SINGLESTEP should count all user-space instructions executed in the process. A syscall instruction counts as one because you can't single-step into the kernel; that's opaque to ptrace.
That should be pretty similar to perf stat --all-user or perf stat -e instructions:u to count user-space instructions. (Probably counting the same within a few instructions out of however many millions). That perf option or :u event modifier tell it to program the HW performance counters to only count the event while the CPU is not at privilege level 0 (kernel mode); modern x86 CPUs have hardware support for this so perf doesn't have to run instructions inside the kernel on every transition to stop and restart counters.
Both of these include everything that happens in user-space, including ld-linux.so dynamic linker code that runs before execution reaches _start in a dynamic executable.
See also How do I determine the number of x86 machine instructions executed in a C program? which includes hand-written asm source for a static executable that only runs 2 instructions in user-space. perf stat --all-user counts 3 instructions for it on my Skylake. That Q&A also has a bunch of other discussion about what happens in a user-space process, and hopefully useful links.
Qemu counting is totally different because it does dynamic translation. See wen liang's answer and What instructions does qemu trace? which Peter Maydell linked in a comment on Kuba's answer here.
If you want to use a tool like this, you might want Intel's SDE, which uses Intel PIN dynamic instrumentation. It can histogram instruction types for you, as well as counting a total. See my answer on How do I determine the number of x86 machine instructions executed in a C program? for links.
When I use gdb to debug process in arm linux I can use call like call write(123,"abc",3)
How does gdb inject that call into process and recover all?
How does gdb inject that call into process and recover all?
GDB can read and write the inferior (being debugged) process memory using ptrace system call.
So it reads and saves in its own memory some chunk of instructions from inferior (say 100 bytes).
Then it overwrites this chunk with new instructions, which look something like:
r0 = 123
r1 = pointer to "abc"
r2 = 3
BLR write
BKPT
Now GDB saves the current inferior registers, sets ip to point to the chunk of instructions it just wrote, and resumes the inferior.
Inferior executes instructions until it reaches the breakpoint, at which point GDB regains control. It can now look at the return register to know what write returned and print it. GDB now restores the original instructions and original register values, and we are back as if nothing happened.
P.S. This is a general description of how "call function in inferior" works; I do not claim that this is exactly how it works.
There are also complications: if write calls back into the code that GDB overwrote, it wouldn't work. So in reality GDB uses some other mechanism to obtain suitable "scratch area" in the inferior. Also, the "abc" string requires scratch area as well.
Kernels lower than 4.6 use assembly stubs to harden the hooking of critical system calls like fork, clone, exec etc. Particularly speaking for execve, the following snippet from Kernel-4.5 shows entry stub of execve:
ENTRY(stub_execve)
call sys_execve
return_from_execve:
...
END(stub_execve)
System call table contains this stub's address and this stub further calls original execve. So, to hook execve in this environment we need to patch call sys_execve in stub with our hooking routine and after doing our desired things call the original execve. This all can be seen in action in execmon, a process execution monitoring utility for linux. I'd tested execmon successfully working in Ubuntu 16.04 with kernel 4.4.
Starting from kernel 4.6, upper scheme for critical calls protection had been changed. Now the stub looks like:
ENTRY(ptregs_\func)
leaq \func(%rip), %rax
jmp stub_ptregs_64
END(ptregs_\func)
where \func will expand to sys_execve for execve calls. Again, system call table contains this stub and this stub calls original execve, but now in a more secured manner instead of just doing call sys_execve.
This newer stub, stores called function's address in RAX register and jumps to another stub shown below (comments removed):
ENTRY(stub_ptregs_64)
cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
jne 1f
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
popq %rax
jmp entry_SYSCALL64_slow_path
1:
jmp *%rax /* called from C */
END(stub_ptregs_64)
Please have a look on this to see comments and other referenced labels in this stub.
I'd tried hard to come up with some logic to overcome this protection and patch original calls with hooking functions, but no success yet.
Would someone like to join me and help to get out of it.
I completely don't understand where you take the security angle from.
Neither previous nor current from of the func is "hardened".
You never stated why do you want to hook execve either.
The standard hooking mechanism is with kprobes and you can check systemtap for an example consumer.
I had a look at aforementioned 'execmon' code and I find it to be of poor quality and in not fit for learning. For instance https://github.com/kfiros/execmon/blob/master/kmod/syscalls.c#L65
accesses userspace memory directly (no get_user, copy_from_user etc.)
does it twice. first it computes the lengths (unbound!) and then copies stuff in. in particular if someone made strings longer after the compupation, but before they get copied, this triggers a buffer overflow.
As I was learning about assembly, I used GDB the following way:
gdb ./a.out (a is a compiled C script that only prints hello world)
break main
run
info registers
Why can I see the registers used by my program when I am myself using the same CPU to print the registers? Shouldn't the use of GDB (or operating system) overwrite the registers and only show me the overwritten registers?
The only answer I can think of is the fact that my CPU is dual-core and that one of the cores is being used and the other is kept for the program.
The operating system maintains the state of the registers for each execution thread. When you are examining registers in gdb, the debugger is actually asking the OS to read the register value from the saved state. Your program is not running at that point in time, it's the debugger which is.
Let's assume there are no other processes on your system. Here is a simplified view of what happens:
Debugger launches and gets the cpu
Debugger asks the OS to load your program
Debugger asks the OS to place the breakpoint
Debugger asks the OS to start executing your program. The OS saves gdb register state and transfers control to your program.
Your program hits the breakpoint. The OS takes control, saves your program's register state, reloads gdb registers and gives cpu back to gdb.
Debugger asks the OS to read the program's registers from the saved state.
Note that this mechanism is part of the normal duties of a multitasking operating system, it's not specific to debugging. When the OS scheduler decides a different program should be executing, it saves the current state and loads another. This is called a context switch and it may happen many times per second, giving the illusion that programs execute simultaneously even if you only have a single cpu core.
Back in the old days of single tasking OSses, the only things that could get in the way of the execution of your program were interrupts. Now, interrupt handlers have the same problem you're talking about, your program is calculating something, user presses a key - interrupt - the interrupt service routine has to do some work but must not modify a single register in the process. That's the main reason, the stack was invented in the first place. A usual 80x86 DOS interrupt service routine would look like this:
push ax
push cx
push dx
push bx
push si
push di
push bp
// no need to push sp
[do actual work, caller registers avaiable on stack if needed]
pop bp
pop di
pop si
pop bx
pop dx
pop cx
pop ax
iret
This was even so common, that a new instruction pair pusha and popa (for push/pop all) was created to ease this task.
In today's CPUs with address space isolation between the operation systems and applications, the CPUs provide some task states system and allow the operation system to switch tasks (interrupts may still work similar to outlined above, but can also be handled via task switching). All modern OSses use this kine of task state systems, where the CPU saves all the registers of a process while it is not being actively executed. Like Jester already explained, gdb just asks the OS for this values on the process to be debugged and then print them.
I am using ptrace to count the syscalls of a program.
The problem is that given a program A, my program prints out the number of the syscalls made (open, close, read, write).
The results of my program and strace (with -c option) with program A as an argument, were identical, except for the open syscalls.
My program printed 15 and strace prints 3.
But I am guessing that as strace prints some other syscalls as well, these might sum up to the 15 open syscalls my program counted.
I am using SYS_open to check the results of ptrace when looking the ORIG_EAX/RAX register.
The syscalls that strace prints are here.
Update:
I compiled my program from the terminal and ran it from there, and the results were the same with strace.
I am developing in netbeans.
Why did this happen?
It sounds like netbeans is using ptrace to control things. (After all, how can it do breakpoints and steps without it?) So netbeans may sending harmless signals to your program.
You can run strace -o /tmp/foo (without -c) to find out exactly what's going on. (compare the output under netbeans and without netbeans to see what's different.)
You can also add "-e open,close" if you want to filter to specific calls.