mmap() for Remote File - c

Currently I am implementing a version of mmap() which its objective is to map a remote file on a client machine. For the implementation, I cannot use any in-built or third party libraries. Having said that, I am in doubt whether the implementation will be based on either of the following two options:
Load the file on the client machine after reading the file contents from the client side and use the mmap() syscall by using the file descriptor obtained from the client machine or
Allocating memory for each chunk of file data received by the client side by using sbrk()
Any suggestions will be greatly appreciated!

This is quite possible to do in Linux, and even in a thread-safe fashion for a multithreaded process, but there is one very difficult function you'd need to implement either yourself, or by using some library.
You would need to decode and emulate any memory-accessing instruction, using an interface similar to
static void emulate(mcontext_t *const context,
void (*fetch)(void *const data,
const unsigned long addr,
size_t bytes),
void (*store)(const unsigned long addr,
const void *const data,
size_t bytes));
The instruction to decode is at (void *)context->gregs[REG_IP] on x86, and at (void *)context->gregs[REG_RIP] on x86-64. The function must skip the instruction by incrementing context->gregs[REG_IP]/context->gregs[REG_RIP]/etc. by the number of bytes in the machine instruction. If you don't, SIGSEGV will just be raised again and again, with the program code stuck in that instruction!
The function must use only the fetch and store callbacks to access the memory that caused the SEGV. In your case, they would be implemented as functions that contact the remote machine, asking it to perform the desired action on the specified bytes.
Assuming you have the above three functions implemented, the rest is just about trivial. For simplicity, lets assume you have
static void *map_base;
static size_t map_size;
static void *map_ends; /* (char *)map_base + map_size */
static void sigsegv_handler(int signum, siginfo_t *info, void *context)
{
if (info->si_addr >= map_base && info->si_addr < map_ends) {
const int saved_errno = errno;
emulate(&((ucontext_t *)context)->uc_mcontext,
your_load_function, your_store_function);
errno = saved_errno;
} else {
struct sigaction act;
sigemptyset(&act.sa_mask);
act.sa_handler = SIG_DFL;
act.sa_flags = 0;
if (sigaction(SIGSEGV, &act, NULL) == 0)
raise(SIGSEGV);
else
raise(SIGKILL);
}
}
static int install_sigsegv_handler(void)
{
struct sigaction act;
sigemptyset(&act.sa_mask);
act.sa_sigaction = handle_sigsegv;
act.sa_mask = SA_SIGINFO;
if (sigaction(SIGSEGV, &act, NULL) == -1)
return errno;
return 0;
}
If map_size was already obtained from the remote machine (and rounded up to sysconf(_SC_PAGESIZE)), then you just need to do
if (install_sigsegv_handler()) {
/* Failed; see errno. Abort. */
}
map_base = mmap(NULL, map_size, PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, (off_t)0);
if ((void *)map_base != MAP_FAILED)
map_ends = (void *)(map_size + (char *)map_base);
else {
/* Failed; see errno. Abort. */
}
Now that I've scared everyone reading this out of their brains, I'm happy to also mention that there is a much easier, portable way to do this. It also tends to be more efficient.
This is not "memory mapping a remote file", but a co-operative scheme where multiple machines can share a mapping. From the user's perspective it's pretty much the same thing, but all parties using the mapping must participate in the work.
Instead of trying to catch every access to the mapped region, use page granularity and introduce the concept of page owner: each page of the mapping is accessible on at most one machine at a time, that machine owning said page.
Memory maps act on page-sized units (see sysconf(_SC_PAGESIZE)). You cannot set a specific byte or arbitrary byte range to be inaccessible or read-only -- unless it is aligned to page boundary. You can change any page to be readable and writable, readable only, or inaccessible (PROT_READ|PROT_WRITE, PROT_READ, and PROT_NONE, respectively; see mmap() and mprotect()).
The owner concept is quite simple. When a machine owns a page, it can freely read and write to the page, otherwise not. Note: If there is a file backing, updating the mapped file contents atomically is very difficult. I really recommend an approach where there is no backing file -- or that the backing file is updated in page-sized chunks using fcntl()-based leases or locking.)
Simply put, each page in the mapping is PROT_READ|PROT_WRITE on exactly one machine, and PROT_NONE in all others.
When somebody tries to write to a read-only page, the SIGSEGV handler on that machine is triggered. It contacts the other machines, and requests the ownership of that particular page. The then-owner, receiving such a message, changes its mapping to PROT_NONE, and sends the page to the new owner. The new owner updates the mapping, changing the protection to PROT_READ|PROT_WRITE, and returns from the SIGSEGV handler.
A couple of notes:
If the SIGSEGV handler returns before a change occurs in the mapping, nothing bad happens. The SIGSEGV signal simply gets immediately re-raised by the same instruction.
I recommend using a separate thread for receiving pages, and updating the local contents of the mapping. Then, the SIGSEGV handler only needs to make sure it has sent a request for ownership of that page, and sched_yield(), to not spin or "twiddle its thumbs" unnecessarily.
Program execution continues when the mapping is updated for that page. send() etc. are async-signal-safe, so you can send the request from the signal handler directly -- but not that you don't want to send the request every time slice (100-1000 times a second!), just once every while.
Remember: If the SIGSEGV signal handler does not resolve the problem, there is no harm done. The SIGSEGV just gets raised immediately again by the same instruction. However, I do warmly recommend using sched_yield(), so that other threads and processes on the machine get to use the CPU, instead of wasting CPU time raising a signal millions of times a second for nothing.
If writes are rare, but reads common, you can extend the ownership concept, to read-owner(s) and write-owner. Each page can be owned by any number of read-owners, as long as there is no write-owner. To modify the page, one needs to be write-owner, and that revokes any read-owners.
The logic is such that any thread can ask for read-ownership. If there is no write-owner, it is automatically granted; either the last write owner or any existing read-owners will send the read-only page contents. If there is a write-owner, it must downgrade its ownership to read-owner, and send the now read-only contents to the requester. To modify a page, one must already be a read-owner, and simply tells all other read-owners that they are now the write-owner.
In this case, the SIGSEGV handler is not much more complicated. If the page protections are PROT_NONE, it will ask for read ownership. If the page protections are PROT_READ, it already has read ownership, and therefore must ask to upgrade it to write-ownership. Note: using this scheme, we do not need to check the instruction whether it tried to access the memory for fetch or store -- indeed, it does not even matter. In the worst case -- write to a page not owned in any way by this thread -- SIGSEGV just gets raised twice: first to get read ownership, and second time to upgrade it to write-ownership.
Note that you cannot upgrade read-ownership to write-ownership in the SIGSEGV handler. If you did that, two threads on separate machines could upgrade their read-ownership at the same time, before the messages reach the other parties. All state changes can only occur after all necessary confirmation TCP messages have arrived.
(Since many-to-many message arbitration is quite complicated, it is almost always better to have a designated arbitrator (or "server"), which handles all the requests from each child. Page transfers can still be direct between members, although you do need to send a notification of each page transfer to the arbitrator/server, too.)
If there is no backing file -- i.e. it is MAP_ANONYMOUS -- you can replace the contents of any page atomically.
When receiving a page, you first get a new anonymous page using mmap(NULL, page, PROT_READ[|PROT_WRITE], MAP_PRIVATE|MAP_ANONYMOUS, -1, (off_t)0), and copy the new data into it. Then, you use mremap() to replace the old page with the new one. (The old page is effectively released as if munmap() was called, but this all happens atomically, so that no thread sees any intermediate state.)
This way you'll be sending just page-sized chunks around. For portability, you should actually use the smallest common multiple of all the page sizes involved, so that every machine can participate regardless of their possible page size differences. (Fortunately, they're always powers of two, and very often 4096, although I do seem to recall architectures that used 512, 2048, 8192, 16384, 32768, 65536, and 2097152 -byte pages, so please do not just hard-code your page size.)
Overall, both approaches have their benefits. The first (requiring the instruction emulator) allows any number of clients to access a memory mapping on one server with no co-operation needed from any of the other mappings to the same file on the server. The second needs co-operation from all parties using the mapping, but reduces the access latencies for multiple consecutive accesses; using the read-owner/write-owner logic, you should get a very performant shared memory management.
If you have difficulty deciding between brk()/sbrk() on one hand, and mmap() at other, I do fear both of these approaches are just too complex for you at this point. You should understand the inherent limitations of memory mapping first -- page granularity et cetera --, and perhaps even some of the cache theory (since this is essentially caching data), so that you can relatively easily manage the concepts involved.
Believe me, trying to program something you cannot really grasp at the conceptual level, leads to frustration. That said, grasping for the concepts, taking the time to learn them as you encounter them while programming, is fine; you just need to spend the time and effort.
Questions?

Here's an idea:
When the caller requests to "remote mmap" a region or an entire file, you allocate memory for that entire size right away and return that pointer. Also store a record of the allocation internally.
Use SFTP or similar to open the remote file. Don't do anything with it yet, just make sure it exists and has the right size.
You install a signal handler for SIGSEGV.
You use mprotect(2) to set the entire allocated space to be inaccessible (PROT_NONE).
When your signal handler is called, use the siginfo_t argument's si_addr parameter to know if the segmentation fault is in the region you allocated in step 1. If not, pass the segmentation fault along, it's probably going to be fatal as they usually are in most programs.
Now you know you have a region of memory which has been requested but is not yet accessible. Populate the memory by reading from the remote file opened in step 2 and return from your signal handler.
What we achieve then is something like "page faults" where we load on demand the required parts of the remote file. Of course, if you know something about the access pattern (e.g. that the entire file will always be needed in some particular order, or will be needed by multiple processes over time) you can do better, perhaps simpler things.

Related

How to determine if a pointer is in rodata [duplicate]

This question already has answers here:
How can I prevent (not react to) a segmentation fault?
(3 answers)
Closed 2 years ago.
Can I tell if a pointer is in the rodata section of an executable?
As in, editing that pointer's data would cause a runtime system trap.
Example (using a C character pointer):
void foo(char const * const string) {
if ( in_rodata( string ) ) {
puts("It's in rodata!");
} else {
puts("That ain't in rodata");
}
}
Now I was thinking that, maybe, I could simply compare the pointer to the rodata section.
Something along the lines of:
if ( string > start_of_rodata && string < end_of_rodata ) {
// it's in rodata!
}
Is this a feasible plan/idea?
Does anyone have an idea as to how I could do this?
(Is there any system information that one might need in order to answer this?)
I am executing the program on a Linux platform.
I doubt that it could possibly be portable
If you don't want to mess with linker scripts or using platform-specific memory map query APIs, a proxy approach is fairly portable on platforms with memory protection, if you're willing to just know whether the location is writable, read-only, or neither. The general idea is to do a test read and a test write. If the first succeeds but the second one fails, it's likely .rodata or code segment. This doesn't tell you "it's rodata for sure" - it may be a code segment, or some other read-only page, such as as read-only file memory mapping that has copy-on-write disabled. But that depends on what you had in mind for this test - what was the ultimate purpose.
Another caveat is: For this to be even remotely safe, you must suspend all other threads in the process when you do this test, as there's a chance you may corrupt some state that code executing on another thread may happen to refer to. Doing this from inside a running process may have hard-to-debug corner cases that will stop lurking and show themselves during a customer demo. So, on platforms that support this, it's always preferable to spawn another process that will suspend the first process in its entirety (all threads), probe it, write the result to the process's address space (to some result variable), resume the process and terminate itself. On some platforms, it's not possible to modify a process's address space from outside, and instead you need to suspend the process mostly or completely, inject a probe thread, suspend the remaining other threads, let the probe do its job, write an answer to some agreed-upon variable, terminate, then resume everything else from the safety of an external process.
For simplicity's sake, the below will assume that it's all done from inside the process. Even though "fully capable" self-contained examples that work cross-process would not be very long, writing this stuff is a bit tedious especially if you want it short, elegant and at least mostly correct - I imagine a really full day's worth of work. So, instead, I'll do some rough sketches and let you fill in the blanks (ha).
Windows
Structured exceptions get thrown e.g. due to protection faults or divide by zero. To perform the test, attempt a read from the address in question. If that succeeds, you know it's at least a mapped page (otherwise it'll throw an exception you can catch). Then try writing there - if that fails, then it was read-only. The code is almost boring:
static const int foo;
static int bar;
#if _WIN32
typedef struct ThreadState ThreadState;
ThreadState *suspend_other_threads(void) { ... }
void resume_other_threads(ThreadState *) { ... }
int check_if_maybe_rodata(void *p) {
__try {
(void) *(volatile char *)p;
} __finally {
return false;
}
volatile LONG result = 0;
ThreadState *state = suspend_other_threads();
__try {
InterlockedExchange(&result, 1);
LONG saved = *(volatile LONG*)p;
InterlockedExchange((volatile LONG *)p, saved);
InterlockedExchange(&result, 0); // we succeeded writing there
} __finally {}
resume_other_threads(state);
return result;
}
int main() {
assert(check_if_maybe_rodata(&foo));
assert(!check_if_maybe_rodata(&bar));
}
#endif
Suspending the threads requires traversing the thread list, and suspending each thread that's not the current thread. The list of all suspended threads has to be created and saved, so that later the same list can be traversed to resume all the threads.
There are surely caveats, and WoW64 threads have their own API for suspension and resumption, but it's probably something that would, in controlled circumstances, work OK.
Unix
The idea is to leverage the kernel to check the pointer for us "at arms length" so that no signal is thrown. Handling POSIX signals that result from memory protection faults requires patching the code that caused the fault, inevitably forcing you to modify the protection status of the code's memory. Not so great. Instead, pass a pointer to a syscall you know should succeed in all normal circumstances to read from the pointed-to-address - e.g. open /dev/zero, and write to that file from a buffer pointed-to by the pointer. If that fails with EFAULT, it is due to buf [being] outside your accessible address space. If you can't even read from that address, it's not .rodata for sure.
Then do the converse: from an open /dev/zero, attempt a read to the address you are testing. If the read succeeds, then it wasn't read-only data. If the read fails with EFAULT that most likely means that the area in question was read-only since reading from it succeeded, but writing to it didn't.
In all cases, it'd be most preferable to use native platform APIs to test the mapping status of the page on which the address you try to access resides, or even better - to walk the sections list of the mapped executable (ELF on Linux, PE on Windows), and see exactly what went where. It's not somehow guaranteed that on all systems with memory protection the .rodata section or its equivalent will be mapped read only, thus the executable's image as-mapped into the running process is the ultimate authority. That still does not guarantee that the section is currently mapped read-only. An mprotect or a similar call could have changed it, or parts of it, to be writable, even modified them, and then perhaps changed them back to read-only. You'd then have to either checksum the section if the executable's format provides such data, or mmap the same binary somewhere else in memory and compare the sections.
But I smell a faint smell of an XY problem: what is it that you're actually trying to do? I mean, surely you don't just want to check if an address is in .rodata out of curiosity's sake. You must have some use for that information, and it is this application that would ultimately decide whether even doing this .rodata check should be on the radar. It may be, it may be not. Based on your question alone, it's a solid "who knows?"

Is Linux kernel splice() zero copy?

I know splice() is designed for zero copy and used Linux kernel pipe buffer to achieve that. For example if I wanted to copy data from one file descriptor(fp1) to another file descriptor(fp2), it didn't need to copy data from "kernel space->user space->kernel space". Instead it just copy data in kernel space the flow will be like "fp1 -> pipe_read -> pipe_write -> fp2". And my question is that dose kernel need to copy data between "fp1 -> pipe_read" and "pipe_write -> fp2"?
The Wikipedia said that:
Ideally, splice and vmsplice work by remapping pages and do not actually copy any data,
which may improve I/O performance. As linear addresses do not necessarily correspond to
contiguous physical addresses, this may not be possible in all cases and on all hardware
combinations.
I have already traced kernel source(3.12) for my question and I found that the flow between "fp1->write_pipe", in the end it would called kernel_readv() in fs/splice.c and then called "do_readv_writev()" and finally called "aio_write()"
558 static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
559 unsigned long vlen, loff_t offset)
//*vec would point to struct page which belong to pipe
The flow between "read_pipe -> fp2" in the end would call "__kernel_write()" and then called "fp2->f_op->write()"
430 ssize_t __kernel_write(struct file *file, const char *buf, size_t count, loff_t *pos)
//*buf is the pipe buffer
And I thought both "aio_write()" and "file->f_op_write()" would perform really data copy, so does splice() really perform zero copy?
As I understand splice(), it will read pages of fd1 and the MMU will map these pages. The reference created by the mapping will be put into the pipe and handed over to fd2.
No real data should be copied in the process, as long as every participant has DMA available.
If no DMA is available you need to copy data.
splice most probably works zero-copy (there is no hard guarantee for that, but it almost certainly works that way for any reasonably recent hardware). Strictly following the docs, you would need to call it with SPLICE_F_MOVE so no actual copies are made, but I don't see how it would need to make one either way as long as there's DMA support (which is a rather fair assumption).
The same is not necessarily true with vmsplice involved since it (or a successive splice) only works zero-copy if the SPLICE_F_GIFT flag is provided (and in this case, I can see how it would not work otherwise, since the "source descriptor" is main memory) but this flag is broken in some and unsupported in other Linux versions, and badly documented on top.
For example, it is not clear what to do with the memory afterwards. The documentation used to say that you are not allowed to touch the gifted memory ever after, this was recently slightly reworded, but it isn't less ambiguous. It remains unclear what is to become of the memory region. Following the documentation, you would have to leak the memory. There seems to be no notification mechanism that tells you when it is safe to free the memory or reuse it.
aio_write is the userland (Glibc) implementation of asynchronous I/O which uses threads and the write syscall. This normally performs at least one copy from user space to kernel space.

How should I read Intel PCI uncore performance counters on Linux as non-root?

I'd like to have a library that allows 'self profiling' of critical sections of Linux executables. In the same way that one can time a section using gettimeofday() or RDTSC I'd like to be able to count events such as branch misses and cache hits.
There are a number of tools that do similar things (perf, PAPI, likwid) but I haven't found anything that matches what I'm looking for. Likwid comes closest, so I'm mostly looking at ways to modify it's existing Marker API.
The per-core counters are values are stored in MSR's (Model Specific Registers), but for current Intel processors (Sandy Bridge onward) the "uncore" measurements (memory accesses and other things that pertain to the CPU as a whole) are accessed with PCI.
The usual approach taken is that the MSR's are read using the msr kernel module, and that the PCI counters (if supported) are read from the sysfs-pci hierarchy. The problem is that both or these require the reader to be running as root and have 'setcap cap_sys_rawio'. This is difficult (or impossible) for many users.
It's also not particularly fast. Since the goal is to profile small pieces of code, the 'skew' from reading each counter with a syscall is significant. It turns out that the MSR registers can be read by a normal user using RDPMC. I don't yet have a great solution for reading the PCI registers.
One way would be to proxy everything through an 'access server' running as root. This would work, but would be even slower (and hence less accurate) than using /proc/bus/pci. I'm trying to figure out how best to make the PCI 'configuration' space of the counters visible to a non-privileged program.
The best I've come up with is to have a server running as root, to which the client can connect at startup via a Unix local domain socket. As root, the server will open the appropriate device files, and pass the open file handle to the client. The client should then be able to make multiple reads during execution on its own. Is there any reason this wouldn't work?
But even if I do that, I'll still be using a pread() system call (or something comparable) for every access, of which there might be billions. If trying to time small sub-1000 cycle sections, this might be too much overhead. Instead, I'd like to figure out how to access these counters as Memory Mapped I/O.
That is, I'd like to have read-only access to each counter represented by an address in memory, with the I/O mapping happening at the level of the processor and IOMMU rather than involving the OS. This is described in the Intel Architectures Software Developer Vol 1 in section 16.3.1 Memory Mapped I/O.
This seems almost possible. In proc_bus_pci_mmap() the device handler for /proc/bus/pci seems to allow the configuration area to be mapped, but only by root, and only if I have CAP_SYS_RAWIO.
static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
{
struct pci_dev *dev = PDE_DATA(file_inode(file));
struct pci_filp_private *fpriv = file->private_data;
int i, ret;
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
/* Make sure the caller is mapping a real resource for this device */
for (i = 0; i < PCI_ROM_RESOURCE; i++) {
if (pci_mmap_fits(dev, i, vma, PCI_MMAP_PROCFS))
break;
}
if (i >= PCI_ROM_RESOURCE)
return -ENODEV;
ret = pci_mmap_page_range(dev, vma,
fpriv->mmap_state,
fpriv->write_combine);
if (ret < 0)
return ret;
return 0;
}
So while I could pass the file handle to the client, it can't mmap() it, and I can't think of any way to share an mmap'd region with a non-descendent process.
(Finally, we get to the questions!)
So presuming I really want have a pointer in a non-privileged process that can read from PCI configuration space without help from the kernel each time, what are my options?
1) Maybe I could have a root process open /dev/mem, and then pass that open file descriptor to the child, which then can then mmap the part that it wants. But I can't think of any way to make that even remotely secure.
2) I could write my own kernel module, which looks a lot like linux/drivers/pci/proc.c but omits the check for the usual permissions. Since I can lock this down so that it is read-only and just for the PCI space that I want, it should be reasonably safe.
3) ??? (This is where you come in)
maybe the answer is a little late. The answer is using likwid.
As you said read MSR/sysfs-pci has to be done by root. Building likwid accessDaemon and giving it the right to access the MSR would bypass this issue. Of course, due to some inter-process communication, performance values could have some delay. This delay is not very high.
(For small code sections, the performance counters are unprecise in some how, in any way.)
Likwid can also with uncore events.
Best

Catching when the linux kernel writes a page back to a memory mapped file?

I'm contemplating a system that would let me memory map files and transparently do type conversion on the data they contain. It seems it's possible to catch memory accesses by mmaping a second memory region and making it protected, then catching the segfault when a new page is accessed. This would let me handle the on-read type conversion I need.
However, to be read/write compatible, I'd need some way to catch when the OS is paging part of the memory back to disk so I could do the type conversion the other way before it's written.
Is there any capability for hooking the paging system in this way?
What you want is not possible, and reflects a fundamental misunderstanding of mmap. The event of file-backed maps being written back on disk is not relevant, because until this happens, any attempt to read the file will (and must, to conform to POSIX) be read from the modified in-memory copy of the page, not the outdated contents on disk. In other words, the writing back of modified pages to disk is completely transparent to applications, and assuming you never lose power or reboot, it would be completely possible that the modified page is never written back to disk.
Your design just doesn't work. You'll have to do something different if you want this kind of behavior.
Using a memory map and a SIGSEGV handler is a bit problematic. First, mprotect() is not async-signal safe, meaning mprotect() in a signal handler is not guaranteed to work. Second, synchronization of the necessary structures between the signal handler and more than one thread is quite complex (although possible using GCC __sync and/or __atomic built-ins) as you cannot use the standard locking primitives in signal handlers -- fortunately you can simply return from the signal handler; the kernel does not skip the offending instruction, so the same signal gets raised immediately afterwards.
I did write a small program to test an anonymous private unreserved memory map, using read() and write() to update the map. The problem is that other threads may access the map while the signal handler is updating it.
I think it might work if you use a temporary file for the currently active region, with an extra page before and after to hold partial records when the records cross page boundaries.
The actual data file would be represented by a private anonymous unreserved inaccessible map (PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE). A SIGSEGV signal handler catches accesses to that map. A page-aligned region of that map is unmapped and mapped from the temporary file (MAP_SHARED | MAP_FIXED | MAP_NORESERVE). The trick is that the temporary file can be additionally mapped (MAP_SHARED | MAP_NORESERVE) to another memory region, and the signal handler can simply unmap the temporary file within the map, to stop other threads from accessing the region during conversion; the data is still available to your library functions in the another memory region (to be read from and written to using read() and write() to the actual data file). MAP_SHARED mean the exact same pages (from page cache) are used, and MAP_NORESERVE means the kernel does not reserve swap or RAM for them.
This approach should work well with respect to threads and locking, but it still suffers from mmap(), munmap(), and mremap() not being async-signal safe. However, if you do have a global variable accessed only atomically causing the signal handler to immediately return if application/library code is modifying the structures and/or maps, this should be reliable.

Reading Other Process' Memory in OS X?

I've been trying to understand how to read the memory of other processes on Mac OS X, but I'm not having much luck. I've seen many examples online using ptrace with PEEKDATA and such, however it doesn't have that option on BSD [man ptrace].
int pid = fork();
if (pid > 0) {
// mess around with child-process's memory
}
How is it possible to read from and write to the memory of another process on Mac OS X?
Use task_for_pid() or other methods to obtain the target process’s task port. Thereafter, you can directly manipulate the process’s address space using vm_read(), vm_write(), and others.
Matasano Chargen had a good post a while back on porting some debugging code to OS X, which included learning how to read and write memory in another process (among other things).
It has to work, otherwise GDB wouldn't:
It turns out Apple, in their infinite wisdom, had gutted ptrace(). The OS X man page lists the following request codes:
PT_ATTACH — to pick a process to debug
PT_DENY_ATTACH — so processes can stop themselves from being debugged
[...]
No mention of reading or writing memory or registers. Which would have been discouraging if the man page had not also mentioned PT_GETREGS, PT_SETREGS, PT_GETFPREGS, and PT_SETFPREGS in the error codes section. So, I checked ptrace.h. There I found:
PT_READ_I — to read instruction words
PT_READ_D — to read data words
PT_READ_U — to read U area data if you’re old enough to remember what the U area is
[...]
There’s one problem solved. I can read and write memory for breakpoints. But I still can’t get access to registers, and I need to be able to mess with EIP.
I know this thread is 100 years old, but for people coming here from a search engine:
xnumem does exactly what you are looking for, manipulate and read inter-process memory.
// Create new xnu_proc instance
xnu_proc *Process = new xnu_proc();
// Attach to pid (or process name)
Process->Attach(getpid());
// Manipulate memory
int i = 1337, i2 = 0;
i2 = process->memory().Read<int>((uintptr_t)&i);
// Detach from process
Process->Detach();
It you're looking to be able to share chunks of memory between processes, you should check out shm_open(2) and mmap(2). It's pretty easy to allocate a chunk of memory in one process and pass the path (for shm_open) to another and both can then go crazy together. This is a lot safer than poking around in another process's address space as Chris Hanson mentions. Of course, if you don't have control over both processes, this won't do you much good.
(Be aware that the max path length for shm_open appears to be 26 bytes, although this doesn't seem to be documented anywhere.)
// Create shared memory block
void* sharedMemory = NULL;
size_t shmemSize = 123456;
const char* shmName = "mySharedMemPath";
int shFD = shm_open(shmName, (O_CREAT | O_EXCL | O_RDWR), 0600);
if (shFD >= 0) {
if (ftruncate(shFD, shmemSize) == 0) {
sharedMemory = mmap(NULL, shmemSize, (PROT_READ | PROT_WRITE), MAP_SHARED, shFD, 0);
if (sharedMemory != MAP_FAILED) {
// Initialize shared memory if needed
// Send 'shmemSize' & 'shmemSize' to other process(es)
} else handle error
} else handle error
close(shFD); // Note: sharedMemory still valid until munmap() called
} else handle error
...
Do stuff with shared memory
...
// Tear down shared memory
if (sharedMemory != NULL) munmap(sharedMemory, shmemSize);
if (shFD >= 0) shm_unlink(shmName);
// Get the shared memory block from another process
void* sharedMemory = NULL;
size_t shmemSize = 123456; // Or fetched via some other form of IPC
const char* shmName = "mySharedMemPath";// Or fetched via some other form of IPC
int shFD = shm_open(shmName, (O_RDONLY), 0600); // Can be R/W if you want
if (shFD >= 0) {
data = mmap(NULL, shmemSize, PROT_READ, MAP_SHARED, shFD, 0);
if (data != MAP_FAILED) {
// Check shared memory for validity
} else handle error
close(shFD); // Note: sharedMemory still valid until munmap() called
} else handle error
...
Do stuff with shared memory
...
// Tear down shared memory
if (sharedMemory != NULL) munmap(sharedMemory, shmemSize);
// Only the creator should shm_unlink()
You want to do Inter-Process-Communication with the shared memory method. For a summary of other commons method, see here
It didn't take me long to find what you need in this book which contains all the APIs which are common to all UNIXes today (which many more than I thought). You should buy it in the future. This book is a set of (several hundred) printed man pages which are rarely installed on modern machines.
Each man page details a C function.
It didn't take me long to find shmat() shmctl(); shmdt() and shmget() in it. I didn't search extensively, maybe there's more.
It looked a bit outdated, but: YES, the base user-space API of modern UNIX OS back to the old 80's.
Update: most functions described in the book are part of the POSIX C headers, you don't need to install anything. There are few exceptions, like with "curses", the original library.
I have definitely found a short implementation of what you need (only one source file (main.c)).
It is specially designed for XNU.
It is in the top ten result of Google search with the following keywords « dump process memory os x »
The source code is here
but from a strict point of virtual address space point de vue, you should be more interested with this question: OS X: Generate core dump without bringing down the process? (look also this)
When you look at gcore source code, it is quite complex to do this since you need to deal with treads and their state...
On most Linux distributions, the gcore program is now part of the GDB package. I think the OSX version is installed with xcode/the development tools.
UPDATE: wxHexEditor is an editor which can edit devices. IT CAN also edit process memory the same way it does for regular files. It work on all UNIX machines.
Manipulating a process's memory behind its back is a Bad Thing and is fraught with peril. That's why Mac OS X (like any Unix system) has protected memory, and keeps processes isolated from one another.
Of course it can be done: There are facilities for shared memory between processes that explicitly cooperate. There are also ways to manipulate other processes' address spaces as long as the process doing so has explicit right to do so (as granted by the security framework). But that's there for people who are writing debugging tools to use. It's not something that should be a normal — or even rare — occurrence for the vast majority of development on Mac OS X.
In general, I would recommend that you use regular open() to open a temporary file. Once it's open in both processes, you can unlink() it from the filesystem and you'll be set up much like you would be if you'd used shm_open. The procedure is extremely similar to the one specified by Scott Marcy for shm_open.
The disadvantage to this approach is that if the process that will be doing the unlink() crashes, you end up with an unused file and no process has the responsibility of cleaning it up. This disadvantage is shared with shm_open, because if nothing shm_unlinks a given name, the name remains in the shared memory space, available to be shm_opened by future processes.

Resources