I'm planning to run an RTOS e.g Nuttx as a Process of another RTOS e.g FreeRTOS such that freertos tasks and the Nuttx running as a Freertos task would co-exist.
Would this be feasible implementation given that the underlying hardware is an ARM cortex A8 single core processor? What changes could be required if the implementation is not based on VM concept?
Your requirement, in a nutshell, is to allow a GUEST RTOS to completely work within the realms of an underlying HOST RTOS. First answer would be to use virtualization extension, but A8 processor does not have that, hence will rule this option out. Without Virtualization extensions you have to resort to one of the following methods and would require a lot of code changes.
Option 1 - Port your GUEST OS API's
Take all your GUEST OS API's and replace their implementation, so that it mimics the required API behavior by making use of HOST OS's API's. Technically now your GUEST OS will not have a scheduler, and will be reduced to a porting layer on top of your HOST OS. This method is used by companies when they need their software solutions to work across multiple RTOS's. They would write their software solution based on an RTOS. When a customer comes to them with a requirement to run the software on their RTOS, they would simply port the RTOS API implementations on to the customer's RTOS.
Option 2 - Para-virtualization
Your guest RTOS user and kernel space should both work inside the userspace of your host RTOS. Let us break the problem into a few parts.
Handling Privileged Instructions
When your Guest OS, while executing in "Kernel mode" tries to execute a privileged instruction, will cause an undef instruction abort. You have to modify the undef instruction abort handler of your host kernel to trap/intercept these instructions and act on them. Every single privileged instructions has to be trapped/intercepted and 'simulated'. There are some instructions that wouldn't trap but would need to be handled by modifying code. Eg. If your kernel code reads CPSR to confirm the execution mode, CPSR would say the mode is User mode. (This instruction wouldn't cause an instruction abort, so you could not follow the trap and simulate model. The only way is to identify, search and replace these instructions in your GUEST OS codebase.)
Memory Management Unit
If a privilege violation happens the Data Abort will be triggered to your host OS. It has to be forwarded to your guest OS.
Interrupts
You would have to replace your GUEST OS's interrupt controller driver with dummy SVC calls that would call into your HOST OS to setup interrupts.
Timers
You would have to modify your GUEST timer driver to account for 'lost' ticks when you were running your HOST OS tasks.
Hardware Drivers
All other hardware drivers used by your GUEST OS have to be modified to allow device sharing between GUEST and HOST.
Schedulers
Your GUEST OS scheduler now works inside (and thus is at the mercy of ) another scheduler (HOST OS Scheduler).
It is feasible.
You need to separate resources: memory, timers, IRQs, etc. So that, "Host" OS (FreeRTOS) don't even "know" about resources used by "Guest" OS (Nuttx).
For Cortex-A8 you may want to use IRQ for FreeRTOS and FIQ for GuestOS. It will let you not to rewrite IRQ controller (but again, make sure Host does not control FIQ after GuestOS started).
And some changes might be required for context switch: you need to differ Host-Host context switch, Host-Guest (and Guest-Host) and Guest-Guest context switch.
Though not direct answer to your question, address this problem at design level, do a separation of code that depends hardware (create API) and make the application level code independent of the underlying OS or runtime i.e rather depend on particular implementation let it depend on the API.
where ever needed port the hardware (OS) dependent code to the underlying OS/Runtime
Related
I am working on a project where I have a router with ARMv7 processor (Cortex A15) and OpenWRT OS. I have a shell on the router and can load kernel modules with insmod.
My goal is to write a kernel module in C which changes the HVBAR register and then executes the hvc instruction to get the processor in the hyp mode.
This is a scientific project where I want to check if I can place my own hypervisor on a running system. But before I start to write my own hypervisor I want to check if and how I can bring the processor in the hyp mode.
According to this picture take from armv7-a manual B.9.3.4 the system must be in insecure mode, not in user mode and the SCR.HCE bit must be set to 1.
My question is how I can prepare the processor with a C kernel module and inline assembly and then execute the hvc instruction. I want to do this with a kernel module because then I start in PL1. This pseudocode describes what I want to achieve:
call smc // to get in monitor mode
set SRC.HCE to 1 // to enable hvc instruction
set SRC.NS to 1 // to set the system to not secure
call hvc #0 // call the hvc instruction to produce a hypervisor exception
The easiest way to elevate privilege is to start off in the needed privilege mode already: You've a root shell. Is the boot chain verified? Could you replace bootloader or kernel, so your code naturally runs in PL2 (HYP) mode? If so, that's probably the easiest way to do it.
If you can't replace the relevant part of the boot chain, the details of writing the rootkit depend a lot on information about your system left out: In which mode is Linux started? Is KVM support enabled and active? Was PL2 initialized? Was it locked? Is there "secure" firmware you can exploit?
The objective is always the same: have HVBAR point at some code you can control and do a hvc. Depending on your environment, solutions may range from spraying as much RAM as possible with your code and hope (perhaps after some reboots) an uninitialized HVBAR would point at an instruction you control to inhibiting KVM from running and accessing the early hypervisor stub to install yourself instead.
Enumerating such exploits is a bit out of scope for a StackOverflow answer; this is rather dissertation material. Indeed, there's a doctoral thesis exactly on this topic:
Strengthening system security on the ARMv7 processor architecture with hypervisor-based security mechanisms
This question may seem slightly vague, however I am researching upon how interrupt systems work and their latency times. I am trying to achieve an understanding of how architecture facilities such as FIQ in ARM help decrease latency times. How does this differ from using a operating system that does not have access or can not provide access to this facilities? For example - Windows RT is made for ARM etc, and this operating system is not able to be ported to other architectures.
Simply put - how is interrupt latency different in dedicated architectures that have dedicated operating systems as compared to operating systems that can be ported across many different architectures (Linux for example)?
Sorry for the rant - I'm pretty confused as you can probably tell.
I'll start with your Windows RT example, Windows RT is a port of Windows to the ARM architecture. It is not a 'dedicated operating system'. There are (probably) many OSes that only run on only 1 architecture, but that is more a function of can't be arsed to port them due to some reason.
What does 'port' really mean though?
Windows has a kernel (we'll call is NT here, doesn't matter) and that NT kernel has a bunch of concepts that need to be implemented. These concepts are things like timers, memory virtualisation, exceptions etc...
These concepts are implemented differently between architectures, so the port of the kernel and drivers (I will ignore the rest of the OS here, often that is a recompile only) will be a matter of using the available pieces of silicon to implement the required concepts. This implementation is a called 'port'.
Let's zoom in on interrupts (AKA exceptions) on an ARM that has FIQ and IRQ.
In general an interrupt can occur asynchronously, by that I mean at any time. The CPU is generally busy doing something when an IRQ is asserted so that context (we'll call it UserContext1) needs to be stored before the CPU can use any resources in use by UserContext1. Generally this means storing registers on the stack before using them.
On ARM when an IRQ occurs the CPU will switch to IRQ mode. Registers r13 and r14 have there own copy for IRQ mode, the rest will need to be saved if they are used - so that is what happens. Those stores to memory take some time. The IRQ is handled, UserContext1 is popped back off the stack then IRQ mode is exited.
So the latency in this case might be the time from IRQ assertion to the time the IRQ vector starts executing. That going to be some set number of clock cycles based upon what the CPU was doing when the IRQ happened.
The latency before the IRQ handling can occur is the time from the IRQ assert to the time the CPU has finished storing the context.
The latency before user mode code can execute depends on too much stuff in the OS/Kernel to explain here, but the minimum boils down to the time from the IRQ assertion to the return after restoring UserContext1 + the time for the OS context switch.
FIQ - If you are a hard as nails programmer you might only need to use 7 registers to completely handle your interrupt servicing. I mentioned that IRQ mode has its own copy of 2 registers, well FIQ mode has its own copy of 7 registers. Yup, that's 28 bytes of context that doesn't need to be pushed out into the stack (actually one of them is the link register so it's really 6 you have). That can remove the need to store UserContext1 then restore UserContext1. Thus the latency can be reduced by up to the length of time needed to do that save/restore.
None of this has much to do with the OS. The OS can choose to use or not use these features. The OS can choose to make guarantees regarding how long it will take to execute the OSes concept of an interrupt handler, or it may not. This is one of the basic concepts of an RTOS, the contract about how long before the handler will run.
The OS is designed for some purpose (and that purpose may be 'general') - that target design goal will have a lot more affect on latency than haw many target the OS has been ported to.
Go have a read about something like freertos than buy some hardware and try it. Annotate the code to figure out the latencies you really want to look at. IT will likely be the best way to get your ehad around it.
(*Multi-CPU systems do it the same with but with some synchronization and barrier functions and a sprinkling of complexity)
I have found some interesting information regarding CPU virtualization for ARM and I'm wondering if you guys could help me understand more about it.
Basically, folks at some company called SierraWare have developed an ARM secure-mode OS called SierraTEE that (they say) virtualizes a guest OS like Linux/Android running in non-secure mode, needing only the Security Extensions. A piece of information from one of their presentation documents has caught my attention, specifically at page 19 of this PDF http://www.sierraware.com/sierraware_tee_hypervisor_overview.pdf they state:
Integrity checks for Rootkits and Kernel Hacks:
Monitor Syscall interrupt and interrupt handler. This will ensure that core syscalls are not tampered with.
By "Syscall interrupt" I understand SVC (=old SWI) instruction executions (correct me if I'm wrong), but by "monitoring" I'm not really sure because it could be real-time monitoring, from-time-to-time monitoring or on certain-events monitoring. In my mind they could monitor the SVC handler to prevent tampering-with by either:
Inspect SVC handler from time to time (timer interrupt for instance, since IRQs and FIQs can be routed to monitor mode) - PatchGuard-like approach, doesn't seem very useful to me
Inspect SVC handler on SVC instruction execution (=certain-events monitoring)
Trap SVC handlers memory region write-access (=real-time monitoring)
Regarding approach 2: would it be possible to trap non-secure SVC instruction executions from secure-mode?
Regarding approach 3: would it be possible to hook non-secure memory-region writes by using only the Security Extensions?
Thanks very much in advance
"Monitor" here may refer to the Monitor mode, the new mode added by the Security Extensions.
I'm not very familiar with the Security Extensions but I imagine it should be possible to mark specific memory regions as secure so any access to them will result in a Monitor mode trap, which can then handle the access and resume the non-secure code execution.
However, I just found this notice in the ARM ARM (B1.8.7 Summaries of asynchronous exception behavior):
In an implementation that includes the Security Extensions but does
not include the Virtualization Extensions, the following
configurations permit the Non-secure state to deny service to the
Secure state. Therefore, ARM recommends that, wherever possible, these
configurations are not used:
Setting SCR.IRQ to 1. With this configuration, Non-secure PL1 software can set CPSR.I to 1, denying the required routing of IRQs to
Monitor mode.
Setting SCR.FW to 1 when SCR.FIQ is set to 1. With this configuration, Non-secure PL1 software can set CPSR.F to 1, denying
the required routing of FIQs to Monitor mode.
The changes introduced by the Virtualization Extensions remove these
possible denials of service.
So it would seem it's not possible to achieve perfect virtualization with just Security Extensions.
For example, virt-what shows if you are running inside hardware virtualization "sandbox".
How to detect if you are running in ARM "TrustZone" sandbox?
TrustZone maybe different than what you think. There is a continuum of modes. From 'a simple API of trusted functions' to 'dual OSs' running in each world.
If there was more context given to the question, it would be helpful. Is this for programatically determining or for reverse engineering considerations? For the current Linux user-space, the answer is no.
Summary
No current user space utility.
Time based analysis.
Code based analysis.
CPU exclusion and SCR.
ID_PRF1 bits [7:4].
virt-what is not a fool-proof way of discovering if you are running under a hyper-visor. It is a program written for linux user-space. Mostly, these are shell scripts which examine /proc/cpuinfo, etc. procfs is a pseudo-file system which runs code in the kernel context and reports to user space. There is no such detection of TrustZone in the main line ARM linux. By design, ARM has made it difficult to detect. An design intent is to have code in the normal world run unmodified.
Code analysis
In order to talk to the secure world, the normal world needs SMC instructions. If your user space has access to kernel code or the vmlinux image, you can try to analyze the code sections for an SMC instruction. However, this code maybe present in the image, but never active. At least this says whether the Linux kernel has some support for TrustZone. You could write a kernel module which would trap any execution of an SMC instruction, but there are probably better solutions.
Timing analysis
If an OS is running in the secure world, some time analysis would show that some CPU cycles have been stolen if frequency scaling is not active. I think this is not an answer in the spirit of the original question. This relies on knowing that the secure world is a full-blown OS with a timer (or at least pre-emptible interrupts).
CPU exclusion and SCR
The SCR (Secure configuration register) is not available in the normal world. From the ARM Cortex-A5 MPcore manual (pg4-46),
Usage constraints The SCR is:
• only accessible in privileged modes
• only accessible in Secure state.
An attempt to access the SCR from any state other than secure privileged
results in an Undefined instruction exception.
ID_PRF1 bits [7:4].
On some Cortex-A series, the instruction,
mrc p15, 0, r0, c0, c1, 1
will get a value where bits [7:4] indicate whether the CPU supports Security Extensions, also known as TrustZone. A non-zero value indicates it is supported. Many early CPUs may not support this CP15 register . So, it is much like the SCR and handling the undefined instruction. Also, it doesn't tell you that code is active in the TrustZone mode.
Summary
It is possible that you could write a kernel module which would try this instruction and handle the undefined exception. This would detect a normal versus secure world. However, you would have to exclude CPUs which don't have TrustZone at all.
If the device is not an ARMv6 or better, then TrustZone is impossible. A great deal of Cortex-A devices have TrustZone in the CPU, but it is not active.
The combined SMC test and a CPU id, is still not sufficient. Some boot loaders run in the secure world and then transition to the normal world. So secure is only active during boot.
Theoretically, it is possible to know, especially with more knowledge of the system. There maybe many signs, such as spurious interrupts from the GIC, etc. However, I don't believe that any user space linux tool exists as of Jan 2014. This is a typical war of escalation between virus/rootkit writers and malware detection software.TZ Rootkits
You have not specified any details of the processor (A8, A9, A15?) or the execution mode (user/kernel/monitor) from where you want to detect the processor state.
As per the ARM documentation, the current state of the processor as Secure (aka the TrustZone sandbox) or Non-secure can be detected by reading the Secure Configuration Register and checking for the NS bit.
To access the Secure Configuration Register: MRC p15, 0, <Rd>, c1, c1, 0
Bit 0 being set corresponds to the processor being in non-secure mode and vice-versa.
You can check the processor's datasheet, and find those registers which behaves different between Normal world and Secure world. Generally, in Secure World, when you read these registers you will just get null. But get data in Normal world. And also, some registers that you can just access in Secure world, if you are in Secure World, you can access it, but in Normal World your access will be rejected.
Any way, there are many ways to distinguish Normal World and Secure World. JUST READ THE DATASHEET IN DETAIL.
To what extent are interrupts supported in Win32 beyond processor definitions? For example, x86 machines define at least 18 interrupts, including traps such as the breakpoint trap (INT 3). The other 19-255 interrupts are left open by Intel as software defined interrupts. Are any of these used by Windows/WinAPI or are they just open and free for applications to use as they please? If Windows uses them, where can I find the relevant documentation? I looked on MSDN and could not find anything.
(BTW I am doing compiler, debugger and other system-level programming, so please don't lecture me on your opinions about the advisability of using interrupts in the first place.)
In Win32 apps, there's probably just one interrupt used commonly, int 2Eh. It's used as the system call entry point. It's analogous to int 21h in DOS. The rest of the interrupts aren't used by apps.
Apps, however, can handle some CPU exceptions (and debug breaks) via Structured Exception Handling (SEH)/Vectored Exception Handling (VEH). Windows catches CPU exceptions originating in apps and reflects them back into the apps, if and however possible (Windows is not perfect in imitating the CPU exception model).
Windows uses device interrupts internally and does not let apps mess with them. The x86 CPU handles interrupts in the most privileged mode, where the kernel runs.
Nowadays many device interrupts aren't associated with fixed interrupt vectors and are configurable and you need to work with the various things like PCI to query or change the settings.
If you want to work with devices and interrupts directly, you need to write a kernel-mode driver for Windows. There's the Device Driver Kit (DDK) and books like Windows Internals that can get you started.
Still, if you're looking for specifics of device XYZ and its interrupt programming, you aren't going to find everything or much on MSDN or in the DDK because you'll need hardware-specific information, something that's outside of Microsoft's control. The kernel provides the functionality necessary to do I/O and handle interrupts, but it's ultimately up to device drivers to use them one way or the other.