What is the canonical way to execute code directly from a QEMU device? - c

I'm modeling a particular evaluation board, which has a leon3 processor and several banks of MRAM mapped to specific addresses. My goal is to start qemu-system-sparc using my bootloader ELF, and then jump to the base address of a MRAM bank to begin executing bare-metal programs therein. To this end, I have been able to successfully run my bootloader and jump to the first instruction, but QEMU immediately stops and exits without reporting any error/trap. I can also run the bare-metal programs in isolation by passing them in ELF format as a kernel to qemu-system-sparc.
Short version: Is there a canonical way to set up a device such that code can be executed from it directly? What steps do I need to take when compiling that code to allow it to execute correctly?
I modeled the MRAM as a device with a MemoryRegion, along with the appropriate read and write operations to expose a heap-allocated array with my program. In my board code (modified version of qemu/hw/sparc/leon3.c), writes to the MRAM address are mapped to the MemoryRegion of the device. Using printfs, I am reporting reads and writes in the style of the unimplemented device (qemu/hw/misc/unimp.c), and I have verified that I am reading and writing to the device correctly.
Unfortunately, this did not work with respect to running the code on the device. I can see the read immediately after the bootloader jumps to the base address of my device, but the instruction read doesn't actually do anything. The bootloader uses a void function pointer, which is tied to the address of the MRAM device to induce a jump.
Another approach I tried is creating an alias to my device starting from address 0; I thought perhaps that my binary has all its addresses set relative to zero, so by mapping writes from addresses [0, MRAM_SIZE) as an alias to my device base address, the code will end up reading the corresponding instructions in the device MemoryRegion.
This approach failed an assert in memory.c:
static void memory_region_add_subregion_common(MemoryRegion *mr,
hwaddr offsset,
MemoryRegion *subregion)
{
assert(!subregion->container);
subregion->container = mr;
subregion->addr = offset;
memory_region_update_container_subregions(subregion);
}
What do I need to do to coerce QEMU to execute the code in my MRAM device? Do I need to produce a binary with absolute addresses?

Older versions of QEMU were simply unable to handle execution from anything other than RAM or ROM, and attempting to do so would give a "qemu: fatal: Trying to execute code outside RAM or ROM" error. QEMU 3.1 and later fixed this limitation, and now can execute code from anywhere -- though execution from a device will be much much slower than executing from RAM.
You mention that you "modeled the MRAM as a device with a MemoryRegion, along with the appropriate read and write operations to expose a heap-allocated array". This sounds like it is probably the wrong approach -- it will work but be very slow. If the MRAM appears to the guest as being like RAM, then model it as RAM (ie with a RAM MemoryRegion). If it's like RAM for reading but writes need to do something other than just-write-to-the-memory (or need to do that some of the time), then model it using a "romd" region, the same way the existing pflash devices do. Nonetheless, modelling it as a device with pure read and write functions should work, it'll just be horribly slow.
The assertion you've run into is the one that says "you can't put a memory region into two things at once" -- the 'subregion' you've passed in is already being used somewhere else, but you've tried to put it into a second container. If you have a MemoryRegion that you need to have appear in two places in the physical memory map, then you need to: create the MemoryRegion; create an alias MemoryRegion that aliases the real one; map the actual MemoryRegion into one place; map the alias into the other. There are plenty of examples of this in existing board models in QEMU.
More generally, you need to figure out what the evaluation board hardware actually is, and then model that. If the eval board has the MRAM visible at multiple physical addresses, then yes, use an alias MR. If it doesn't, then the problem is somewhere else and you need to figure out what's actually happening, not try to bodge around it with aliases that don't exist on the real hardware. QEMU's debug logging (various -d suboptions, plus -D file to log to a file) can be useful for checking what the emulated CPU is really doing in this early bootup phase -- but watch out as the logs can be quite large and they are sometimes tricky to interpret unless you know a little about QEMU internals.

Related

Using OpenOCD to determine RAM usage in microcontroller (ARM Cortex-M3)

I'd like to see how much RAM is used by the firmware by writing a known pattern, and comparing RAM contents to see how much has been modified.
I've tried
reset halt
load_image pattern.bin 0xaddress
resume
(let target run for a bit)
halt
dump_image sram.bin 0xaddress 0xsize
but it appears I have obtained flash contents and cannot see the test pattern anywhere.
Am I using the proper commands? If I "verify" manually by loading and dumping, the data is identical.
Could halt affect the RAM contents? Otherwise, is it safe to assume that the application in fact initializes all of the RAM, making analysis difficult/impossible?
I should point out that I only have a "dump" of the firmware, i.e. I am not building it.
I had to do soft_reset_halt to get the PC to the reset vector address.
My version of OpenOCD warns me that the command is deprecated.
Then I was able to spot a few occurrences of my test pattern in the RAM dump.
Also, there are notable differences between the RAM image and the firmware, so it seems that the firmware is indeed using most of the RAM.
(this might not be an issue if your interface is using a physical reset line?)

Are other parts of physical memory accessed during a segfault?

As part of a learning project, I've worked a bit on Spectre and Meltdown PoCs to get myself more confortable with the concept. I have managed to recover previously accessed data using the clock timers, but now I'm wondering how do they actually read physical memory from that point.
Which leads to my question : in a lot of Spectre v1\v2 examples, you can read this piece of toy-code example:
if (x<y) {
z = array[x];
}
with x supposedly being equal to : attacked_adress - adress_of_array, which will effectively lead to z getting the value at attacked_adress.
In the example it's quite easy to understand, but in reality how do they even know what attacked_adress looks like ?
Is it a virtual address with an offset, or a physical address, and how do they manage to find where is the "important memory" located in the first place ?
In the example it's quite easy to understand, but in reality how do they even know what attacked_adress looks like ?
You are right, Spectre and Meltdown are just possibilities, not a ready-to-use attacks. If you know an address to attack from other sources, Spectre and Meltdown are the way to get the data even using a browser.
Is it a virtual address with an offset, or a physical address, and how do they manage to find where is the "important memory" located in the first place ?
Sure, it is a virtual address, since it is all happening in user space program. But prior to the recent kernel patches, we had a full kernel space mapped into each user space process. That was made to speedup system calls, i.e. do just a privilege context switch and not a process context switch for each syscall.
So, due to that design and Meltdown, it is possible to read kernel space from unprivileged user space application (for example, browser) on unpatched kernels.
In general, the easiest attack scenario is to target machines with old kernels, which does not use address randomization, i.e. kernel symbols are at the same place on any machine running the specific kernel version. Basically, we run specific kernel on a test machine, write down the "important memory addresses" and then just run the attack on a victims machine using those addresses.
Have a look at my Specter-Based Meltdown PoC (i.e. 2-in-1): https://github.com/berestovskyy/spectre-meltdown
It is much simpler and easier to understand than the original code from the Specre paper. And it has just 99 lines in C (including comments).
It uses the described above technique, i.e. for Linux 3.13 it simply tries to read the predefined address 0xffffffff81800040 which is linux_proc_banner symbol located in kernel space. It runs without any privileges on different machines with kernel 3.13 and successfully reads kernel space on each machine.
It is harmless, but just a tiny working PoC.

Significance of Reset Vector in Modern Processors

I am trying to understand how computer boots up in very detail.
I came across two things which made me more curious,
1. RAM is placed at the bottom of ROM, to avoid Memory Holes as in Z80 processor.
2. Reset Vector is used, which takes the processor to a memory location in ROM, whose contents point to the actual location (again ROM) from where processor would actually start executing instructions (POST instruction). Why so?
If you still can't understand me, this link will explain you briefly,
http://lateblt.tripod.com/bit68.txt
The processor logic is generally rigid and fixed, thus the term hardware. Software is something that can be changed, molded, etc. thus the term software.
The hardware needs to start some how, two basic methods,
1) an address, hardcoded in the logic, in the processors memory space is read and that value is an address to start executing code
2) an address, hardcoded in the logic, is where the processor starts executing code
When the processor itself is integrated with other hardware, anything can be mapped into any address space. You can put ram at address 0x1000 or 0x40000000 or both. You can map a peripheral to 0x1000 or 0x4000 or 0xF0000000 or all of the above. It is the choice of the system designers or a combination of the teams of engineers where things will go. One important factor is how the system will boot once reset is relesed. The booting of the processor is well known due to its architecture. The designers often choose two paths:
1) put a rom in the memory space that contains the reset vector or the entry point depending on the boot method of the processor (no matter what architecture there is a first address or first block of addresses that are read and their contents drive the booting of the processor). The software places code or a vector table or both in this rom so that the processor will boot and run.
2) put ram in the memory space, in such a way that some host can download a program into that ram, then release reset on the processor. The processor then follows its hardcoded boot procedure and the software is executed.
The first one is most common, the second is found in some peripherals, mice and network cards and things like that (Some of the firmware in /usr/lib/firmware/ is used for this for example).
The bottom line though is that the processor is usually designed with one boot method, a fixed method, so that all software written for that processor can conform to that one method and not have to keep changing. Also, the processor when designed doesnt know its target application so it needs a generic solution. The target application often defines the memory map, what is where in the processors memory space, and one of the tasks in that assignment is how that product will boot. From there the software is compiled and placed such that it conforms to the processors rules and the products hardware rules.
It completely varies by architecture. There are a few reasons why cores might want to do this though. Embedded cores (think along the lines of ARM and Microblaze) tend to be used within system-on-chip machines with a single address space. Such architectures can have multiple memories all over the place and tend to only dictate that the bottom area of memory (i.e. 0x00) contains the interrupt vectors. Then then allows the programmer to easily specify where to boot from. On Microblaze, you can attach memory wherever the hell you like in XPS.
In addition, it can be used to easily support bootloaders. These are typically used as a small program to do a bit of initialization, then fetch a larger program from a medium that can't be accessed simply (e.g. USB or Ethernet). In these cases, the bootloader typically copies itself to high memory, fetches below it and then jumps there. The reset vector simply allows the programmer to bypass the first step.

How can I emulate a memory I/O device for unit testing on linux?

How can I emulate a memory I/O device for unit testing on Linux?
I'm writing a unit test for some source code for embedded deployment.
The code is accessing a specific address space to communicate with a chip.
I would like to unit test(UT) this code on Linux.
The unit test must be able to run without human intervention.
I need to run the UT as a normal user.
The code must being tested must be exactly the source code being run on the target system.
Any ideas of where I could go for inspiration on how to solve this?
Can an ordinary user somehow tell the MMU that a particular memory allocation must be done at a specific address.
Or that a data block must be in a particular memory areas?
As I understand it:
sigsegv can't be used; since after the return from the handler the same mem access code will be called again and fail again. ( or by accident the memory area might actually have valid data in it, just not what I would like)
Thanks
Henry
First, make the address to be read an injected dependency of the code, instead of a hard-coded dependency. Now you don't have to worry about the location under test conditions, it can be anything you like.
Then, you may also need to inject a function to read/write from/to the magic address as a dependency, depending what you're testing. Now you don't have to worry about how it's going to trick the code being tested into thinking it's performing I/O. You can stub/mock/whatever the hardware I/O behavior.
It's quite difficult to test low-level code under the conditions you describe, whilst also keeping it super-efficient in non-test mode, because you don't want to introduce too many levels of indirection.
"Exactly the source code" can hide a multitude of sins, though, depending how you interpret it. For example, your "dependency injection" could be via a macro, so that the unit source is "the same", but you've completely changed what it does with a sneaky -D compiler option.
AFAIK you need to create a block device (I am not sure whether character device will work). Create a kernel module that maps that memory range to itself.
create read/write function, so whenever that memory range is touched, those read/write functions are called.
register those read/write function with the kernel, so that whenever there is read/write to those addresses, kernel is invoked and read/write functionality is performed by kernel on behalf of user.

Flow of Startup code in an embedded system , concept of boot loader?

I am working with an embedded board , but i don't know the flow of the start up code(C/assembly) of the same.
Can we discuss the general modules/steps acted upon by the start up action in the case of an embedded system.
Just a high level overview(algorithmic) is enough.All examples are welcome.
/Kanu__
CPU gets a power on reset, and jumps to a defined point: the reset vector, beginning of flash, ROM, etc.
The startup code (crt - C runtime) is run. This is an important piece of code generated by your compiler/libc, which performs:
Configure and turn on any external memory (if absolutely required, otherwise left for later user code).
Establish a stack pointer
Clear the .bss segment (usually). .bss is the name for the uninitialized (or zeroed) global memory region. Global variables, arrays, etc which don't have an initializing value (beyond 0) are located here. The general practice on a microcontroller is to loop over this region and set all bytes to 0 at startup.
Copy from the end of .text the non-const .data. As most microcontrollers run from flash, they cannot store variable data there. For statements such as int thisGlobal = 5;, the value of thisGlobal must be copied from a persistent area (usually after the program in flash, as generated by your linker) to RAM. This applies to static values, and static values in functions. Values which are left undefined are not copied but instead cleared as part of step 2.
Perform other static initializers.
Call main()
From here, your code is run. Generally, the CPU is left in an interrupts-off state (platform dependent).
Pretty open-ended question, but here are a few things I have picked up.
For super simple processors, there is no true startup code. The cpu gets power and then starts running the first instruction in its memory: no muss no fuss.
A little further up we have mcu's like avr's and pic's. These have very little start up code. The only thing that really needs to be done is to set up the interrupt jump table with appropriate addresses. After that it is up to the application code (the only program) to do its thing. The good news is that you as the developer doesn't generally have to worry about these things: that's what libc is for.
After that we have things like simple arm based chips; more complicated than the avr's and pic's, but still pretty simple. These also have to setup the interrupt table, as well as make sure the clock is set correctly, and start any needed on chip components (basic interrupts etc.). Have a look at this pdf from Atmel, it details the start up procedure for an ARM 7 chip.
Farther up the food chain we have full-on PCs (x86, amd64, etc.). The startup code for these is really the BIOS, which are horrendously complicated.
The big question is whether or not your embedded system will be running an operating system. In general, you'll either want to run your operating system, start up some form of inversion of control (an example I remember from a school project was a telnet that would listen for requests using RL-ARM or an open source tcp/ip stack and then had callbacks that it would execute when connections were made/data was received), or enter your own control loop (maybe displaying a menu then looping until a key has been pressed).
Functions of Startup Code for C/C++
Disables all interrupts
Copies any initialized data from ROM to RAM
Uninitialized data area is set to zero.
Allocates space for and initializes the stack
Initializes the processor’s stack pointer
Creates and initializes the heap
Executes the constructors and initializers for all global variables (C++ only)
Enables interrupts
Calls main
Where is "BOOT LOADER" placed then? It should be placed before the start-up code right?
As per my understanding, from the reset vector the control goes to the boot loader. There the code waits for a small period of time during which it expects for data to be flashed/downloaded to the controller/processor. If it does not detect a data then the control gets transferred to the next step as specified by theatrus. But my doubt is whether the BOOT LOADER code can be re-written. Eg: Can a UART bootloader be changed to a ETHERNET/CAN bootloader or is it that data sent using any protocol are converted to UART using a gateway and then flashed.

Resources