C write/read detection on memory block - c

i like to ask if someone have any idea how to detect a write on alloc memory address.
At first i used mprotect along with sigaction to force a segmentation fault when was made a write/read operation.
Two negative factor with this approach among several:
is the difficult to pass through a segmentation fault
the memory address pass in mprotect must be aligned to a page boundary i.e its not possible to handle this memory address with a simple malloc.
To clarify the problematic:
I construct a app in C for cluster environment. In some point I allocate memory that i call buffer in a local host and assign some data. This buffer will be sent to a remote node and have the same procedure. At same point this buffer will be write/read in remote node but i don't know when(it will be used DMA to write/read buffer), the local host must be notified about buffer modification. Like i said above i already used some mechanisms but none of them
its capable to handle it with some ability. For now i just want some idea.
Every different idea here its welcome.
Thanks

You could use hardware breakpoints. The downsides are that this is hardware specific and only a limited number of breakpoints can be set. Also most of the times such facilities are not task specific, so if you run multiple instances of the program they'll share the number of available 'slots'.
The x86 architecture has debug registers which can be used to set hardware memory breakpoints (see: http://en.wikipedia.org/wiki/X86_debug_register).
If you want to test this you could use GDB to set hardware breakpoints. You can use the 'watch' command of GDB to place a hardware memory breakpoint on a variable.
Note that using debug registers and mprotect() are just methods to get the job done you're asking for, I don't think they are sound engineering practices for doing memory management (what you probably try to do here). Maybe you can explain a bit more about what you trying to do at a higher level: http://catb.org/esr/faqs/smart-questions.html#goal

Related

OSDev: Why my memory allocation function suddenly stops working in the AHCI initialization function?

After my kernel calls the AHCIInit() function inside of the ArchInit() function, I get a page fault in one of the MemAllocate() calls, and this only happens in real machines, as I tried replicating it on VirtualBox, VMWare and QEMU.
I tried debugging the code, unit testing the memory allocator and removing everything from the kernel, with the exception from the memory manager and the AHCI driver itself, the only thing that I discovered is that something is corrupting the allocation blocks, making the MemAllocate() page fault.
The whole kernel source is at https://github.com/CHOSTeam/CHicago-Kernel, but the main files where the problem probably occours are:
https://github.com/CHOSTeam/CHicago-Kernel/blob/master/mm/alloc.c
https://github.com/CHOSTeam/CHicago-Kernel/blob/master/arch/x86/io/ahci.c
I expected the AHCIInit() to detect and initialize all the AHCI devices and the boot to continues until it reaches the session manager or the kernel shell, but in real computers it page faults before even initializing the scheduler (so no, the problem isn't my scheduler).
If it works in emulators but doesn't work on real hardware; then the first things I'd suspect are:
bugs in physical memory management. For example, physical memory manager initialization not rounding "starting address of usable RAM area" up to a page boundary or not rounding "ending address of usable RAM area" down to a page boundary, causing a "half usable RAM and half not usable RAM" page to be allocated by the heap later (where it works on emulators because the memory map provided by firmware happens to describe areas that are nicely aligned anyway).
a bug where RAM is assumed to contain zeros but may not be (where it works on emulators because they tend to leave almost all RAM full of zeros).
a race condition (where different timing causes different behavior).
However; this is a monolithic kernel, which means that you'll continually be facing "any piece of code in kernel-space did something that caused problems for any other piece of code anywhere else"; and there's a bunch of common bugs with memory usage (e.g. accidentally writing past the end of what you allocated). For this reason I'd want better tools to help diagnose problems, especially for heap.
Specifically, for the heap I'd start with canaries (e.g. put a magic number like 0xFEEDFACE before each block of memory in the heap, and another different number after each block of memory in the heap; and then check that the magic numbers are still present and correct where convenient - e.g. when blocks are freed or resized). Then I'd write a "check_heap()" function that scans through everything checking as much as possible (the canaries, if statistics like "number of free blocks" are actually correct, etc). The idea being that (whenever you suspect something might have corrupted the heap) you can insert a call to the "check_heap()" function, and move that call around until you find out which piece of code caused the heap corruption. I'd also suggest having a "what" parameter in your "kmalloc() or equivalent" (e.g. so you can do things like myFooStructure = kmalloc("Foo Structure", sizeof(struct foo));), where the provided "what string" is stored in the allocated block's meta-data, so that later on (when you find out the heap was corrupted) you can display the "what string" associated with the block before the corruption, and so that you can (e.g.) list how many of each type of thing there currently is to help determine what is leaking memory (e.g. if the number of "Foo Structure" blocks is continually increasing). Of course these things can be (should be?) enabled/disabled by compile time options (e.g. #ifdef DEBUG_HEAP).
The other thing I'd recommend is self tests. These are like unit tests, but built directly into the kernel itself and always present. For example, you could write code to pound the daylights out of the heap (e.g. allocate random sized pieces of memory and fill them with something until you run out of memory, then free half of them, then allocate more until you run out of memory again, etc; while calling the "check_heap()" function between each step); where this code could/should take a "how much pounding" parameter (so you could spend a small amount of time doing the self test, or a huge amount of time doing the self test). You could also write code to pound the daylights out of the virtual memory manager, and the physical memory manager (and the scheduler, and ...). Then you could decide to always do a small amount of self testing each time the kernel boots and/or provide a special kernel parameter/option to enable "extremely thorough self test mode".
Don't forget that eventually (if/when the OS is released) you'll probably have to resort to "remote debugging via. email" (e.g. where someone without any programming experience, who may not know very much English, sends you an email saying "OS not work"; and you have to try to figure out what is going wrong before the end user's "amount of hassle before giving up and not caring anymore" counter is depleted).

Getting as much uninitialized memory as possible

I'm trying to create a C/C++ program that dumps as much uninitialized memory as possible.
The program has to be run by a local user, i.e in user mode.
It does not work to use malloc:
Why does malloc initialize the values to 0 in gcc?
The goal is not to use this data as a seed for randomness.
Does the OS always make sure that you can't see "leftovers" from other processes?
If possible, I would like references to implementations or further explanation.
The most common multi-user operating systems (modern Windows, Linux, other Unix variants, VMS--probably all OSes with a concept of virtual memory) try to isolate processes from one another for security. If process A could read process B's leftover memory, it might get access to user data it shouldn't have, so these operating systems will clear pages of memory before they become available to a new process. You would probably have to have elevated privileges to get at uninitialized RAM, and the solution would likely depend on which operating system it was.
Embedded OSes, DOS, and ancient versions of Windows generally don't have the facilities for protecting memory. But they also don't have a concept of virtual memory or of strong process isolation. On these, just allocating memory through the usual methods (e.g., malloc) would give you uninitialized memory without you having to do anything special.
For more information on Windows, you can search for Windows zero page thread to learn about the OS thread whose only job is to write zeros in unused pages so that they can be doled out again. Also, Windows has a feature called superfetch which fills up unused RAM with files that Windows predicts you'll want to open soon. If you allocated memory and Windows decided to give you a superfetch page, there would be a risk that you'd see the contents of a file you don't have access to read. This is another reason why pages must be cleared before they can be allocated to a process.
You got uninitialized memory. It contains indeterminate values. In your case those values are all 0. Nothing unexpected. If you want pseudo-random numbers use a PRNG. If you want real random numbers/entropy, use a legitimate random source like your operating system's random number device (e.g. /dev/urandom) or API.
No operating system in its right mind is going to provide uninitialized memory to a process.
The closest thing you are going to find is the stack. That memory will have been initialized when mapped to the process but much of it will have been overwritten.
It's common sense. We don't need to document that 1+1=2 either.
An operating system that leaks secrets between processes would be useless for many applications. So if a general purpose operating system that wants to be general purpose it will isolate processes. Keeping track of which pages might contain secrets and which are safe would be too much work and too error-prone, so we assume that every page that has ever been used is dirty and contains secrets. Initializing new pages with garbage is slower than initializing them with just one value, so random garbage isn't used. The most useful value is zero (for calloc or bss for example), so new pages are zeroed to clear them.
There's really no other way to do it.
There might be special purpose operating systems that don't do it and do leak secrets between processes (it might be necessary for real-time requirements for example). Some older operating systems didn't have decent memory management and privilege isolation. Also, malloc will reuse previously freed memory within the same process. Therefore malloc will be documented to contain uninitialized garbage. But that doesn't mean you'll ever be able to obtain uninitialized memory from another process on a general purpose operating system.
I guess a simple rule of thumb is: if your operating system ever asks you for a password it will not give uninitialized pages to a process and since zeroing is the only reasonable way to initialize pages, they will be zeroed.

copy_from_user and segmentation

I was reading a paragraph from the "The Linux Kernel Module Programming Guide" and I have a couple of doubts related to the following paragraph.
The reason for copy_from_user or get_user is that Linux memory (on
Intel architecture, it may be different under some other processors)
is segmented. This means that a pointer, by itself, does not reference
a unique location in memory, only a location in a memory segment, and
you need to know which memory segment it is to be able to use it.
There is one memory segment for the kernel, and one for each of the
processes.
However it is my understanding that Linux uses paging instead of segmentation and that virtual addresses at and above 0xc0000000 have the kernel mapping in.
Do we use copy_from_user in order to accommodate older kernels?
Do the current linux kernels use segmentation in any way at all? If so how?
If (1) is not true, are there any other advantages to using copy_from_user?
Yeah. I don't like that explanation either. The details are essentially correct in a technical sense (see also Why does Linux on x86 use different segments for user processes and the kernel?) but as you say, linux typically maps the memory so that kernel code could access it directly, so I don't think it's a good explanation for why copy_from_user, etc. actually exist.
IMO, the primary reason for using copy_from_user / copy_to_user (and friends) is simply that there are a number of things to be checked (dangers to be guarded against), and it makes sense to put all of those checks in one place. You wouldn't want every place that needs to copy data in and out from user-space to have to re-implement all those checks. Especially when the details may vary from one architecture to the next.
For example, it's possible that a user-space page is actually not present when you need to copy to or from that memory and hence it's important that the call be made from a context that can accommodate a page fault (and hence being put to sleep).
Also, user-space data pointers need to be checked carefully to ensure that they actually point to user-space and that they point to data regions, and that the copy length doesn't wrap beyond the end of the valid regions, and so forth.
Finally, it's possible that user-space actually doesn't share the same page mappings with the kernel. There used to be a linux patch for 32-bit x86 that made the complete 4G of virtual address space available to user-space processes. In that case, kernel code could not make the assumption that a user-space pointer was directly accessible, and those functions might need to map individual user-space pages one at a time in order to access them. (See 4GB/4GB Kernel VM Split)

Linux - Duplicate a virtual memory address from malloc or move a virtual memory address

Short question:
Is it possible to map a buffer that has been malloc'd to have two ways (two pointers pointing to the same physical memory) of accessing the same buffer?
Or, is it possible to temporarily move a virtual memory address received by malloc? Or is it possible to point from one location in virtual space to another?
Background:
I am working with DirectFB, a surface management and 2D graphics composting library. I am trying to enforce the Locking protocol which is to Lock a surface, modify the memory only while locked (the pointer is to system memory allocated using malloc), and unlocking the surface.
I am currently trying to trace down a bug in an application that is locking a surface and then storing the pixel pointer and modifying the surface later. This means that the library does not know when it is safe to read or write to a surface. I am trying to find a way to detect that the locking protocol has been violated. What I would like is a way to invalidate the pointer passed to the user after the unlock call is made. Even better, I would like the application to seg fault if it tries to access the memory after the lock. This would stop in the debugger and give us an idea of which surface is involved, which routine is involved, who called it, etc.
Possible solutions:
Create a temporary buffer, pass the buffer pointer to the user, on unlock copy the pixels to the actual buffer, delete the temporary
buffer.
Pros: This is an implementable solution.
Cons: Performance is slow as it requires a copy which is expensive, also the memory may or may not be available. There is no
way to guarantee that one temporary surface overlaps another allowing
an invalidated pointer to suddenly work again.
Make an additional map to a malloc'd surface and pass that to the user. On unlock, unmap the memory.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Need to set aside a reserved range of addresses are never used by anything else (including malloc or the kernel). Also need to
ensure that no two surfaces overlap which could allow an old pointer
to suddenly point to something valid and not seg fault when it should.
Take advantage of the fact that the library does not access the memory while locked by the user and simply move the virtual address on
a lock and move it back on an unlock.
Pros: Very fast, no additional memory required.
Cons: Unknown if this is possible.
Gotchas: Same as "2" above.
Is this feasible?
Additional info:
This is using Linux 2.6, using stdlib.
The library is written in C.
The library and application run in user space.
There is a possibility of using a kernel module (to write a custom memory allocation routine), but the difficulty of writing a module in
my current working climate would probably reduce the chances to near
zero levels that I could actually implement this solution. But if this
is the only way, it would be good to know.
The underlying processor is x86.
The function you want to create multiple mappings of a page is shm_open.
You may only be using the memory within one process, but it's still "shared memory" - that is to say, multiple virtual mappings for the same underlying physical page will exist.
However, that's not what you want to do. What you should actually do is have your locking functions use the mprotect system call to render the memory unreadable on unlock and restore the permissions on lock; any access without the lock being held will cause a segfault. Of course, this'll only work with a single simultaneous accessing thread...
Another, possibly better, way to track down the problem would be to run your application in valgrind or another memory analysis tool. This will greatly slow it down, but allows you very fine control: you can have a valgrind script that will mark/unmark memory as accessible and the tool will kick you straight into the debugger when a violation occurs. But for one-off problem solving like this, I'd say install an #ifdef DEBUG-wrapped mprotect call in your lock/unlock functions.

How can I emulate a memory I/O device for unit testing on linux?

How can I emulate a memory I/O device for unit testing on Linux?
I'm writing a unit test for some source code for embedded deployment.
The code is accessing a specific address space to communicate with a chip.
I would like to unit test(UT) this code on Linux.
The unit test must be able to run without human intervention.
I need to run the UT as a normal user.
The code must being tested must be exactly the source code being run on the target system.
Any ideas of where I could go for inspiration on how to solve this?
Can an ordinary user somehow tell the MMU that a particular memory allocation must be done at a specific address.
Or that a data block must be in a particular memory areas?
As I understand it:
sigsegv can't be used; since after the return from the handler the same mem access code will be called again and fail again. ( or by accident the memory area might actually have valid data in it, just not what I would like)
Thanks
Henry
First, make the address to be read an injected dependency of the code, instead of a hard-coded dependency. Now you don't have to worry about the location under test conditions, it can be anything you like.
Then, you may also need to inject a function to read/write from/to the magic address as a dependency, depending what you're testing. Now you don't have to worry about how it's going to trick the code being tested into thinking it's performing I/O. You can stub/mock/whatever the hardware I/O behavior.
It's quite difficult to test low-level code under the conditions you describe, whilst also keeping it super-efficient in non-test mode, because you don't want to introduce too many levels of indirection.
"Exactly the source code" can hide a multitude of sins, though, depending how you interpret it. For example, your "dependency injection" could be via a macro, so that the unit source is "the same", but you've completely changed what it does with a sneaky -D compiler option.
AFAIK you need to create a block device (I am not sure whether character device will work). Create a kernel module that maps that memory range to itself.
create read/write function, so whenever that memory range is touched, those read/write functions are called.
register those read/write function with the kernel, so that whenever there is read/write to those addresses, kernel is invoked and read/write functionality is performed by kernel on behalf of user.

Resources