I wanted to use /proc/<pid>/map directory in order to get info about virtual memory of a process (especially about its shared libraries). Since mac os doesn't have one I'm trying to find other ways. One of them seems to be sysctl call but I don't quite understand how to use it for such purpose. Are there any examples? I know it also can be done via some mach_vm interface calls but documentation is quite pour. Maybe you know any other ways of reading process memory? My mac os version is Darwin by the way.
Note: the purpose is to do this without using any utilities or fork/exec calls. I also don't want any pseudofs to be mounted.
macOS' virtual memory subsystem is in the Mach-inherited part of the kernel, so those APIs are definitely the ones to use. For inspecting regions, look at mach_vm_region() (called vm_region in the original Mach - you will find more documentation for that), for reading memory, use mach_vm_read().
You may also find the vmmap command line utility to come in useful for exploration.
Related
I am curious about how statically linked C executables would work in different environments. Lets say we compile our C code to target x86 MacOs and we statically include everything it uses in the executable as well (print, strlen). What really stops this executable from running in a Windows OS if we include every library it needs? I understand the file format could be different and break but other than that would this technically be able to run?
I see where you're coming from, operating systems makes us think as programmers that libraries are the be-all end-all of programming, that a call to a library is all you need to make complex things happen and that everything is contained in them.
But the truth is, libraries mostly provide provide an abstraction layer. As an exemple let's create a library called "hello_world.so" which prints "Hello World!" to the console. That library we created relies on stdio to handle the complex I/O stuff but stdio itself depends on at least one other thing: the kernel (some specific targets work without a kernel but these system are outside the scope of this answer).
In the desktop world, things can get really complicated, we have several hundreds of processes all running at once even in an idle system, all these apps need access to the hardware (possibly at once too) so it was decided a controller was needed, some piece of software that would coordinate all other software running on the same computer. This piece of software is usually called a kernel. On Windows it's the NT kernel, on macOS it's the XNU and on Linux it's... the Linux kernel!
On these systems, the biggest job of a library is to abstract the kernel, to make us believe printing text on a Linux or a Windows console works the exact same way when actually it can be completely different! Libraries like stdio/time/etc have different "implementations" but the same "interface": they look the same from the dev point of view but the way they achieve their goals can vary wildy, they can do conversions, calls to other hidden or non hidden functions... All this is completely portable from one OS to the other though, things start to go south for you idea when kernel calls start to show up.
Kernel calls are ways a program can "talk" to the kernel. They can be used to do literally thousands of different things but for example there's one (or several ones) to ask for memory (usually this is called with malloc), one to print to the console, one to ask if a network packet arrived, on to ask to talk to your GPU... And these system calls are completely different from one kernel to the other, sometimes even for two versions of the same kernel!
These "kernel calls" are the only thing preventing you from running statically-compiled linux programs on Windows.
PS: Even though all the above is completely true and kernels can be as different from one another as they wish, due to the history of kernels and of computing in general, some kernels actually share the same interface (even though their implementation as you guessed, can be nothing alike). The best example I know of is how most kernels I know of are based on the UNIX kernel.
It means that -even though I have never tested it myself- I think you should be able to statically link a Linux app and use it on Linux, most BSDs and possibly even macOS
The binary and libraries are specific to the operating system.
The TLDR is that the linking process translates function calls into adresses that points of the Operating System's specific libraries. There are some differences like alignment that happens at compile time, but the responsible for your x86 instructions not running under a different OS is the linker.
Your compiler produces x86 instructions that are ready to execute but is incomplete. The linker will go into every function call give a adress for that function in the executable file, even for functions of the standard library.
The linker will create the executable file following a file format which have a header with information like size, metadata and entry point.Windows and Unix have differents specifications for executable files. Windows has PE and Unix has ELF format both for executables and libraries.
Through some hacks and non-trivial tricks it is possible to create an executable that can run on Windows and Unix (see αcτµαlly pδrταblε εxεcµταblε for how).
But even if you do all that there is an obstacle that can not be circumvented: the kernel. The kernel is, well, a kernel. It's the most important thing in
a OS and it provides a set of API calls that provides basic and low-level access to computer resources, so functions like malloc are implemented using the kernel specific API call, VirtualAlloc for Windows and vmalloc or mmap for POSIX.
Main Answer
If your program does anything useful (print output, return a value, communicate on the network), it contains some form of system-call instruction. Each system-call instruction is a request to a particular operating system, and macOS system calls will not work on Windows and vice-versa.
The system-call instruction sends information to the operating system, including a number identifying which service is requested. The operating system that performs that service. When you build your program for macOS, it includes library routines that contain system-call instructions. If you execute those instructions on a Microsoft Windows system, Windows will not understand the macOS requests. It will interpret the information differently, and the program will not work.
So, in theory, there is nothing preventing you from writing your own program loader that reads an ELF executable file intended for macOS, loading its contents into memory, and transferring control to its entry point. But the program will not work because of the system calls.
Supplement
You might consider translating all the system calls in the program. Changing the primary number that identifies the service request might be feasible; it might not require changing the executable too much. For example, if 37 is a “write to file” request on macOS, your program loader might change it to 48 on Windows. However, the system calls also require other data be passed, such as pointers to buffers, lengths, and so on, and there are likely many discrepancies in how those are passed, so that macOS requests cannot be easily translated into Windows requests. Also, it can be technically challenging to identify all places in a program that a certain instruction is used—some of the contents of memory of a loaded program are instructions and some are data. Most normal programs may be well-behaved and easy to analyze in this regard, but not all are.
Another potential issue is that programs may expect to have certain modes set in the processor, and the host operating system may or may not have set those modes as needed.
I'm working on a LKM which needs to retrieve and write a certain set of information to files. I looked up common ways to do so, but could not find a working one for Linux 4.x. I also found out that it is possible to retrieve system calls from memory and effectively call them.
As I found currently no better way I'd be interested if it'd be feasible to find the system call table, and call open, read/write and close this way.
This is strongly discouraged in most situations.
https://www.linuxjournal.com/article/8110 was a really good read for me the first time I thought I had to do this as well.
From within the Linux kernel, however, reading data out of a file for configuration information is considered to be forbidden. This is due to a vast array of different problems that could result if a developer tries to do this.
Indeed this is possible to do using system calls from within the kernel, but the practice of calling system calls from within the kernel is also generally discouraged. They're designed as interfaces for userspace applications to ask things of the kernel, not for the kernel to get itself to do work.
What kind of files do you want to manipulate from within the kernel? If the kind of file you'd like to manipulate is provided by the proc filesystem or the sysfs filesystem or the dev filesystem, you can modify the contents of the file from within the kernel (since the kernel provides these to userspace itself) -- this should be done NOT with file manipulation calls. If it's a normal userspace file, almost never do you want the kernel to be able to modify it.
If you provide more specifics I'd be interested to hear them, but this is usually a bad idea.
My goal is to determine when executing a command, precisely which files it reads and writes. On Linux I can do this using ptrace (with work, akin to what strace does) and on FreeBSD and MacOS I can do this with the ktrace system command. What would you use to obtain this information on Windows?
My research so far suggests that I either use the debugger interface (similar to ptrace in many ways) or perhaps ETW. A third alternative is to interpose a DLL to intercept system calls as they are made. Unfortunately, I don't have the experience to guess as to how challenging each of these approaches will be.
Any suggestions?
Unfortunately it seems there is no easy way to intercept file level operations on Windows.
Here are some hints:
you could try to use FileMon from Sysinternals if it is enough for your needs, or try to look at the source of the tool
you could make use of commercial software like Detours - beware, I never used that myself and I'm not sure it really meets your needs
If you want a better understanding and are not frightened at doing it by hand, the Windows way of intercepting file I/O is using a File System Filter Driver. In fact, there is a FilterManager embedded in Windows system that can forward all file system calls to minifilters.
To build it, the interface with the system is provided by the FilterManager, and you have just (...) to code and install the minifilter that does the actual filtering - beware again never tested that ...
As you suggested, this is a fairly simple task to solve with API hooking with DLL injection.
This is a pretty good article about the application: API hooking revealed
I believe you can find more recent articles about the issue.
However, you probably need to use C++ to implement such a utility. By the way, programs can disable DLL injection. For example, I weren't able to use this approach on the trial version of Photoshop.
So, you may want to check if you can inject DLL files in the process you want with an existing solution before you start writing your own.
Please, take a look to the article CDirectoryChangeWatcher - ReadDirectoryChangesW all wrapped up.
It is a very old, but running, way to watch directory changes.
Microsoft owns a bunch of tools called Sysinternals. There is a program called Process Monitor that will show you all the file accesses for a particular process. This is very likely what you want.
Check this particular Stack Overflow question out for your question... This might help you:
Is there something like the Linux ptrace syscall in Windows?
Also, if you are running lower versions like Windows XP then you should check out Process Monitor.
Also, I would like you to check this out...
Monitoring certain system calls done by a process in Windows
I need to read disk queue length (separately for read and write operations) on Mac OSX. I already came to conclusion that this may be done only via dtrace (I would be happy to be wrong here, however I did not find any way of doing this differently). The only way which provided this information is iopending dtrace script. I need to be able to access the information it provides (or rather be able to implement its logic) in my C program. Usage of libdtrace is very cryptic (considering private API), as the overall dtrace business. Is there any example (besides a few I have found which don't answer my question - libdtrace buffered output and http://www.osdevcon.org/2008/files/osdevcon2008-petr.pdf) which can help me?
Using libdtrace directly can be a bit hairy since it's technically a private API, but you can find examples in other DTrace consumers. libdtrace is basically the same on all platforms that support it (Mac OS, Solaris, FreeBSD) and as a result the API is very stable. Solaris gets a few more updates, however, and IIRC Mac OS doesn't support all the features available on other platforms. However, this gives you more examples to work from.
You can either look at the source code of the dtrace command on one of those platforms, or you can look at the source code for some wrapper of the library such as node-libdtrace. I'd recommend the latter since it's just a wrapper that provides important high-level operations, which should make it simpler to figure out which code does what.
I have a very nice idea for a kernel patch, and I want to conduct some research and see code examples before I shape my idea.
I'm looking for interesting code examples that would demonstrate advanced usage of procfs (the Linux /proc file system). By interesting, I mean more than just reading a documented value.
My idea is to provide every process with an easy broadcast mechanism. For example, let's consider a process that runs multiple instances of rsync and wants to check the transfer status (how many bytes have been transfered so far) for each child. Currently, I don't know of any way that can be done.
I intend to provide the process with a minimal interface to write data to the procfs. That data would be placed under the PID directory. For example:
/procfs/1343/data_transfered/incoming
I can think of numerous advantage for this, mainly in the concurrency field.
By the way, if such a mechanism already exists, do tell...
Yes, I've written stuff that pokes around in /proc. I suspect you are unlikely to get linux kernel patches accepted that do anything with proc, unless they are just fixing something that is already there that was broken in some way.*
/sysfs seems to be where things are moving.
/proc was originally for process information, but a lot of misc. driver stuff ended up in there.
*well, maybe they'll take it if whatever you're doing has to do with processes, and isn't in a driver.
Go look at the source code for the procps package for code that uses /proc
http://github.com/tialaramex/leakdice/tree/master
Uses proc to figure out the memory address layout of a process, and dump random pages from its heap (for reasons which are explained in its documentation).