ptrace get changed memory by syscall - c

ptrace can get the registers and memory data when entry/exit syscall. But if linux syscall handler change some memory include some place in stack, How can I get to know which memory has been changed.

You cannot; but for example strace (which in turn uses ptrace) knows the semantics of most (all?) syscalls and can show you the memory changed.
For example, if the syscall-number is 0, strace knows, that the read()-syscall is invoked and that the kernel will write to the address specified in the second parameter. The number of bytes written there equals the return value of the syscall.
Now, the contents of these memory locations can be read with PTRACE_PEEK* and be displayed to you.
However, when it comes to custom syscalls with unknown or less-strict semantics (for example a proposed syscall write_to_random_memory_location()); you cannot determine memory changes with ptrace() in general (neither from kernel nor from userspace).
Depending on what you need to achieve, a general solution can only be to utilize some sort of virtualization (for example, what valgrind does in userspace) and simulate/watch all memory accesses.

Related

copy_from_user and segmentation

I was reading a paragraph from the "The Linux Kernel Module Programming Guide" and I have a couple of doubts related to the following paragraph.
The reason for copy_from_user or get_user is that Linux memory (on
Intel architecture, it may be different under some other processors)
is segmented. This means that a pointer, by itself, does not reference
a unique location in memory, only a location in a memory segment, and
you need to know which memory segment it is to be able to use it.
There is one memory segment for the kernel, and one for each of the
processes.
However it is my understanding that Linux uses paging instead of segmentation and that virtual addresses at and above 0xc0000000 have the kernel mapping in.
Do we use copy_from_user in order to accommodate older kernels?
Do the current linux kernels use segmentation in any way at all? If so how?
If (1) is not true, are there any other advantages to using copy_from_user?
Yeah. I don't like that explanation either. The details are essentially correct in a technical sense (see also Why does Linux on x86 use different segments for user processes and the kernel?) but as you say, linux typically maps the memory so that kernel code could access it directly, so I don't think it's a good explanation for why copy_from_user, etc. actually exist.
IMO, the primary reason for using copy_from_user / copy_to_user (and friends) is simply that there are a number of things to be checked (dangers to be guarded against), and it makes sense to put all of those checks in one place. You wouldn't want every place that needs to copy data in and out from user-space to have to re-implement all those checks. Especially when the details may vary from one architecture to the next.
For example, it's possible that a user-space page is actually not present when you need to copy to or from that memory and hence it's important that the call be made from a context that can accommodate a page fault (and hence being put to sleep).
Also, user-space data pointers need to be checked carefully to ensure that they actually point to user-space and that they point to data regions, and that the copy length doesn't wrap beyond the end of the valid regions, and so forth.
Finally, it's possible that user-space actually doesn't share the same page mappings with the kernel. There used to be a linux patch for 32-bit x86 that made the complete 4G of virtual address space available to user-space processes. In that case, kernel code could not make the assumption that a user-space pointer was directly accessible, and those functions might need to map individual user-space pages one at a time in order to access them. (See 4GB/4GB Kernel VM Split)

About sbrk() and malloc()

I've read the linux manual about sbrk() thoroughly:
sbrk() changes the location of the program break, which defines the end
of the process's data segment (i.e., the program break is the first
location after the end of the uninitialized data segment).
And I do know that user space memory's organization is like the following:
The problem is:
When I call sbrk(1), why does it say I am increasing the size of heap? As the manual says, I am changing the end position of "data segment & bss". So, what increases should be the size of data segment & bss, right?
The data and bss segments are a fixed size. The space allocated to the process after the end of those segments is therefore not a part of those segments; it is merely contiguous with them. And that space is called the heap space and is used for dynamic memory allocation.
If you want to regard it as 'extending the data/bss segment', that's fine too. It won't make any difference to the behaviour of the program, or the space that's allocated, or anything.
The manual page on Mac OS X indicates you really shouldn't be using them very much:
The brk and sbrk functions are historical curiosities left over from earlier days before the advent of virtual memory management. The brk() function sets the break or lowest address of a process's data segment (uninitialized data) to addr (immediately above bss). Data addressing is restricted between addr and the lowest stack pointer to the stack segment. Memory is allocated by brk in page size pieces; if addr is not evenly divisible by the system page size, it is increased to the next page boundary.
The current value of the program break is reliably returned by sbrk(0) (see also end(3)). The getrlimit(2) system call may be used to determine the maximum permissible size of the data segment; it will not be possible to set the break beyond the rlim_max value returned from a call to getrlimit, e.g. etext + rlp->rlim_max (see end(3) for the definition of etext).
It is mildly exasperating that I can't find a manual page for end(3), despite the pointers to look at it. Even this (slightly old) manual page for sbrk() does not have a link for it.
Notice that today sbrk(2) is rarely used. Most malloc implementations are using mmap(2) -at least for large allocations- to acquire a memory segment (and munmap to release it). Quite often, free simply marks a memory zone to be reusable by some future malloc (and does not release any memory to the Linux kernel).
(so practically, the heap of a modern linux process is made of several segments, so is more subtle than your picture; and multi-threaded processes have one stack per thread)
Use proc(5), notably /proc/self/maps and /proc/$pid/maps, to understand the virtual address space of some process. Try first to understand the output of cat /proc/self/maps (showing the address space of that cat command) and of cat /proc/$$/maps (showing the address space of your shell). Try also to look at the maps pseudo-file for your web browser (e.g. cat /proc/$(pidof firefox)/maps or cat /proc/$(pidof iceweasel)/maps etc...); I have more than a thousand lines (so process segments) in it.
Use strace(1) to understand the system calls done by a given command or process.
Take advantage that on Linux most (and probably all) C standard library implementations are free software, so you can study their source code. The source code of musl-libc is quite easy to read.
Read also about ELF, ASLR, dynamic linking & ld-linux(8), and the Advanced Linux Programming book then syscalls(2)

scan memory of calling process

I have to scan the memory space of a calling process in C. This is for homework. My problem is that I don't fully understand virtual memory addressing.
I'm scanning the memory space by attempting to read and write to a memory address. I can not use proc files or any other method.
So my problem is setting the pointers.
From what I understand the "User Mode Space" begins at address 0x0, however, if I set my starting point to 0x0 for my function, then am I not scanning the address space for my current process? How would you recommend adjusting the pointer -- if at all -- to address the parent process address space?
edit: Ok sorry for the confusion and I appreciate the help. We can not use proc file system, because the assignment is intended for us to learn about signals.
So, basically I'm going to be trying to read and then write to an address in each page of memory to test if it is R, RW or not accessible. To see if I was successful I will be listening for certain signals -- I'm not sure how to go about that part yet. I will be creating a linked list of structure to represent the accessibility of the memory. The program will be compiled as a 32 bit program.
With respect to parent process and child process: the exact text states
When called, the function will scan the entire memory area of the calling process...
Perhaps I am mistaken about the child and parent interaction, due to the fact we've been covering this (fork function etc.) in class, so I assumed that my function would be scanning a parent process. I'm going to be asking for clarification from the prof.
So, judging from this picture I'm just going to start from 0x0.
From a userland process's perspective, its address space starts at address 0x0, but not every address in that space is valid or accessible for the process. In particular, address 0x0 itself is never a valid address. If a process attempts to access memory (in its address space) that is not actually assigned to that process then a segmentation results.
You could actually use the segmentation fault behavior to help you map out what parts of the address space are in fact assigned to the process. Install a signal handler for SIGSEGV, and skip through the whole space, attempting to read something from somewhere in each page. Each time you trap a SIGSEGV you know that page is not mapped for your process. Go back afterward and scan each accessible page.
Do only read, however. Do not attempt to write to random memory, because much of the memory accessible to your programs is the binary code of the program itself and of the shared libraries it uses. Not only do you not want to crash the program, but also much of that memory is probably marked read-only for the process.
EDIT: Generally speaking, a process can only access its own (virtual) address space. As #cmaster observed, however, there is a syscall (ptrace()) that allows some processes access to some other processes' memory in the context of the observed process's address space. This is how general-purpose debuggers usually work.
You could read (from your program) the /proc/self/maps file. Try first the following two commands in a terminal
cat /proc/self/maps
cat /proc/$$/maps
(at least to understand what are the address space)
Then read proc(5), mmap(2) and of course wikipages about processes, address space, virtual memory, MMU, shared memory, VDSO.
If you want to share memory between two processes, read first shm_overview(7)
If you can't use /proc/ (which is a pity) consider mincore(2)
You could also non-portably try reading from (and perhaps rewriting the same value using volatile int* into) some address and catching SIGSEGV signal (with a sigsetjmp(3) in the signal handler), and do that in a -dichotomical- loop (in multiple of 4Kbytes) - from some sane start and end addresses (certainly not from 0, but probably from (void*)0x10000 and up to (void*)0xffffffffff600000)
See signal(7).
You could also use the Linux (Gnu libc) specific dladdr(3). Look also into ptrace(2) (which should be often used from some other process).
Also, you could study elf(5) and read your own executable ELF file. Canonically it is /proc/self/exe (a symlink) but you should be able to get its from the argv[0] of your main (perhaps with the convention that your program should be started with its full path name).
Be aware of ASLR and disable it if your teacher permits that.
PS. I cannot figure out what your teacher is expecting from you.
It is a bit more difficult than it seems at the first sight. In Linux every process has its own memory space. Using any arbitrary memory address points to the memory space of this process only. However there are mechanisms which allow one process to access memory regions of another process. There are certain Linux functions which allow this shared memory feature. For example take a look at
this link which gives some examples of using shared memory under Linux using shmget, shmctl and other system calls. Also you can search for mmap system call, which is used to map a file into a process' memory, but can also be used for the purpose of accessing memory of another process.

What is program break? Where does it start from,0x00?

int brk(void *end_data_segment);
void *sbrk(intptr_t increment);
Calling sbrk() with an increment of
0
can be used to find the current location of the program break.
What is program break? Where does it start from,0x00?
Oversimplifying:
A process has several segments of memory:
Code (text) segment, which contains the code to be executed.
Data segment, which contains data the compiler knows about (globals and statics).
Stack segment, which contains (drumroll...) the stack.
(Of course, nowadays it's much more complex. There is a rodata segment, a uninitialized data segment, mappings allocated via mmap, a vdso, ...)
One traditional way a program can request more memory in a Unix-like OS is to increment the size of the data segment, and use a memory allocator (i.e. malloc() implementation) to manage the resulting space. This is done via the brk() system call, which changes the point where the data segment "breaks"/ends.
A program break is end of the process's data segment. AKA...
the program break is the first
location after the end of the
uninitialized data segment
As to where it starts from, it's system dependent but probably not 0x00.
These days, sbrk(2) (and brk) are nearly obsolete system calls (and you can nearly forget about them and ignore the old notion of break; focus on understanding mmap(2)). Notice that the sbrk(2) man page says in its NOTES :
Avoid using brk() and sbrk(): the malloc(3) memory allocation package
is the portable and comfortable way of allocating memory.
(emphasis mine)
Most implementations of malloc(3) (notably the one in musl-libc) are rather using mmap(2) to require memory - and increase their virtual address space - from the kernel (look at that virtual address space wikipage, it has a nice picture). Some malloc-s use sbrk for small allocations, mmap for large ones.
Use strace(1) to find out the system calls (listed in syscalls(2)) done by some given process or command. BTW you'll then find that bash and ls (and probably many other programs) don't make a single call to sbrk.
Explore the virtual address space of some process by using proc(5). Try cat /proc/$$/maps and cat /proc/self/maps and even cat /proc/$$/smaps and read a bit to understand the output.
Be aware of ASLR & vdso(7).
And sbrk is not very thread friendly.
(my answer focuses on Linux)
You are saying that sbrk() is an obsolute system call and that we should use malloc(), but malloc(), according to her documentation, when allocating less memory than 128 KiB (32 pages) uses it. So we shouldn´t use sbrk() directly, but malloc() use it, if allocation is bigger than 128 KiB then malloc() uses mmap() that allocates private pages to the userspace.
Finally its a good idea to understand sbrk(), at least for understanding the "Program Break" concept.
Based on the following widely used diagram:
program break, which is also known as brk in many articles, points to the address of heap segment's end.
When you call malloc, it changes the address of program break.

How does mprotect() work?

I was stracing some of the common commands in the linux kernel, and saw mprotect() was used a lot many times. I'm just wondering, what is the deciding factor that mprotect() uses to find out that the memory address it is setting a protection value for, is in its own address space?
On architectures with an MMU1, the address that mprotect() takes as an argument is a virtual address. Each process has its own independent virtual address space, so there's only two possibilities:
The requested address is within the process's own address range; or
The requested address is within the kernel's address range (which is mapped into every process).
mprotect() works internally by altering the flags attached to a VMA2. The first thing it must do is look up the VMA corresponding to the address that was passed - if the passed address was within the kernel's address range, then there is no VMA, and so this search will fail. This is exactly the same thing happens if you try to change the protections on an area of the address space that is not mapped.
You can see a representation of the VMAs in a process's address space by examining /proc/<pid>/smaps or /proc/<pid>/maps.
1. Memory Management Unit
2. Virtual Memory Area, a kernel data structure describing a contiguous section of a process's memory.
This is about virtual memory. And about dynamic linker/loader. Most mprotect(2) syscalls you see in the trace are probably related to bringing in library dependencies, though malloc(3) implementation might call it too.
Edit:
To answer your question in comments - the MMU and the code inside the kernel protect one process from the other. Each process has an illusion of a full 32-bit or 64-bit address space. The addresses you operate on are virtual and belong to a given process. Kernel, with the help of the hardware, maps those to physical memory pages. These pages could be shared between processes implicitly as code, or explicitly for interprocess communications.
The kernel looks up the address you pass mprotect in the current process's page table. If it is not in there then it fails. If it is in there the kernel may attempt to mark the page with new access rights. I'm not sure, but it may still be possible that the kernel would return an error here if there were some special reason that the access could not be granted (such as trying to change the permissions of a memory mapped shared file area to writable when the file was actually read only).
Keep in mind that the page table that the processor uses to determine if an area of memory is accessible is not the one that the kernel used to look up that address. The processor's table may have holes in it for things like pages that are swapped out to disk. The tables are related, but not the same.

Resources