ARM TrustZone - Behaviour of the scheduler in Secure and Non-Secure OS - arm

Can some one please explain to me that after the CPU is taken to secured mode, (Monitor program sets the NS = 0), how does the secure OS gets scheduled?
Is it that now that the CPU is in secured mode, the timer tick interrupt would be handled by the Secured OS and not the Non-Secured world?

The monitor mode setting NS=0 will set CP15 registers visible from monitor mode. See: monitor mode IFAR/IFSR.... When the monitor mode switches to another mode and NS=0, then the mode is the secure world version; meaning the banked CP15 registers are the secure version. Also the NS bit is clear on bus cycles.
If NS=1 is set, then when monitor mode switches, the banked CP15 registers are the normal version; mainly the normal world MMU will be active. Also, the NS bit is set on bus cycles. TZ vs hypervisor
How does the secure OS gets scheduled?
Monitor mode does this. The SCR (cp15 c1, c1, 0) has bits which determine whether the monitor vector table is used or the current CPU world (secure or normal). If you are in a normal world and you wish for a timer to interrupt that world, you need monitor mode to handle it.
You can setup the monitor mode in two possible ways,
Have all secure interrupts as FIQ.
Trap all interrupts to monitor.
The first choice is recommended. In this mode, the monitor code must ensure that SCR#FIQ (bit 2) is set in the normal world, but clear in the secure world. The SCR#IRQ (bit 1) will be set when running the secure OS (if you want normal interrupts to interrupt the secure OS) and clear in the normal world.
So when the secure timer has a FIQ interrupt, it traps to monitor mode which does a world switchref1 and runs the secure OS timer code. This secure timer may cause the secure world to reschedule. The way the normal and secure world schedulers interact is up to software. Ie, there is no generic answer. It depends on,
Monitor mode
The secure OS.
The normal world OS.
Mainly the ARM TrustZone does not handle the secure OS scheduling by itself. You need to write software that uses the primitives provided to implement this. ARM TrustZone only facilitates different ways of implementing it. TrustZone Whitepaper
See: How to develop programs for TrustZone for some alternative setups.
Ref1: A world switch saves/restores all general purpose CPU registers for all used modes. Ie, on a normal to secure world switch, the R0-R15 (and all banked copies) plus possibly NEON/VFP must be saved to a normal world store. Similarly, the registers must be reloaded for the secure world. Monitor mode sp provides a good anchor for accessing these world contexts. Monitor mode sp should be setup during secure boot, before the normal world initializes. This is much like a traditional OS context switch. The SCR#NS (bit 0) is set appropriately; you may do this before or after the register switching, depending on how you save the registers (Ie, by mode switch or by srs).

Related

arm trustzone monitor mode switch design

the basic world switch flow is:
set FIQ to monitor mode
normal world -> FIQ triggered
-> enter monitor mode (do switch to Secure world, restore Secure world context)
-> in Secure world sys mode
-> FIQ is not clear, enter FIQ handler in Secure world
step3 and step 4, after we restore the target context,
arm will trigger the exception to enter the exception
is the behavior correct? (if we dont branch to FIQ handle in monitor mode vector table)
we need flow like below:
(no world context switch case, just enter monitor mode to check if we need world switch, and enter irq exception from monitor mode directly. we need this because of our hw limitation, we only have IRQ in our chip)
set IRQ to monitor mode
normal world user mode -> IRQ triggered
-> enter monitor, do something we want to hook, check if we need context switch, prepare some spsr/lr for IRQ mode
-> enter normal world IRQ mode, irq handling
-> irq done, return back to user mode
for non-world switch case, we would like to let the normal world os does not know about the monitor mode, just though he enters the irq mode directly and return from irq mode.
for world switch case, just switch it in the monitor mode.
or it's just do the irq_handle in the monitor mode?
eq.
normal world OS usr mode -> irq -> usr mode
normal world OS usr mode -> monitor to irq handler -> usr mode
is the flow possible and well design?
is the flow possible and well design?
It is possible. 'well designed' is subjective. It has several fails or non-ideal issues. I guess your system doesn't have a GIC; which is a trustzone aware interrupt controller. The GIC has banked registers which allow the normal world OS to use it (almost) as if it was in the secure world.
It is not clear from you question whether you want the secure world to have interrupts? I guess from the statement 'for non-world switch case...'. If you only have interrupts handled by the normal world, things are simple. Don't branch to monitor mode on an IRQ (or FIQ). There is a register to set this behaviour (SCR/security configuration register).
For the dual world interrupt case, you have two issues.
You need to trust the normal world OS.
Interrupt latency will be increased.
You must always take the interrupt in monitor mode. The monitor must check the interrupt controller source to see what world the interrupt belongs to. It may need to do a world switch depending on the world. This will increase interrupt latency. As well, both the normal and secure world will be dealing with the same interrupt controller registers. So you have malicious security concerns and non-malicious race conditions with multiple interrupt drivers trying to manipulate registers (RMW). Generally, if your chip doesn't have a GIC, but the CPU supports TrustZone, the your system hasn't been well thought through for TrustZone use. The L1/L2 cache controllers must also be TrustZone aware and you possible have issue there as well.
If you have Linux (or some other open source OS in the normal world), it would be better to replace the normal world interrupt driver with a 'virtual' interrupt driver. The normal world virtual IRQ code would use the SMC instruction to set virtual registers and register IRQ routines for specific interrupts. The secure world/monitor IRQ code would then branch directly to the decoded IRQ routine.
With a GIC, set the group 0 (secure world) interrupts as FIQ and group 1 (normal world) as IRQ using the GICC_CTLR bit FIQEnb. Ie, you classify the interrupts with the DIST in the GIC to be either secure or normal (and therefore FIQ/IRQ).
You have to work through scheduling issues and how you want the different OS's to pre-empt. Normally (easiest) is to always have the secure OS running, but this means that some Linux (normal world) interrupts may be very delayed by the secure world (RTOS) main line code.

RTOS within an RTOS

I'm planning to run an RTOS e.g Nuttx as a Process of another RTOS e.g FreeRTOS such that freertos tasks and the Nuttx running as a Freertos task would co-exist.
Would this be feasible implementation given that the underlying hardware is an ARM cortex A8 single core processor? What changes could be required if the implementation is not based on VM concept?
Your requirement, in a nutshell, is to allow a GUEST RTOS to completely work within the realms of an underlying HOST RTOS. First answer would be to use virtualization extension, but A8 processor does not have that, hence will rule this option out. Without Virtualization extensions you have to resort to one of the following methods and would require a lot of code changes.
Option 1 - Port your GUEST OS API's
Take all your GUEST OS API's and replace their implementation, so that it mimics the required API behavior by making use of HOST OS's API's. Technically now your GUEST OS will not have a scheduler, and will be reduced to a porting layer on top of your HOST OS. This method is used by companies when they need their software solutions to work across multiple RTOS's. They would write their software solution based on an RTOS. When a customer comes to them with a requirement to run the software on their RTOS, they would simply port the RTOS API implementations on to the customer's RTOS.
Option 2 - Para-virtualization
Your guest RTOS user and kernel space should both work inside the userspace of your host RTOS. Let us break the problem into a few parts.
Handling Privileged Instructions
When your Guest OS, while executing in "Kernel mode" tries to execute a privileged instruction, will cause an undef instruction abort. You have to modify the undef instruction abort handler of your host kernel to trap/intercept these instructions and act on them. Every single privileged instructions has to be trapped/intercepted and 'simulated'. There are some instructions that wouldn't trap but would need to be handled by modifying code. Eg. If your kernel code reads CPSR to confirm the execution mode, CPSR would say the mode is User mode. (This instruction wouldn't cause an instruction abort, so you could not follow the trap and simulate model. The only way is to identify, search and replace these instructions in your GUEST OS codebase.)
Memory Management Unit
If a privilege violation happens the Data Abort will be triggered to your host OS. It has to be forwarded to your guest OS.
Interrupts
You would have to replace your GUEST OS's interrupt controller driver with dummy SVC calls that would call into your HOST OS to setup interrupts.
Timers
You would have to modify your GUEST timer driver to account for 'lost' ticks when you were running your HOST OS tasks.
Hardware Drivers
All other hardware drivers used by your GUEST OS have to be modified to allow device sharing between GUEST and HOST.
Schedulers
Your GUEST OS scheduler now works inside (and thus is at the mercy of ) another scheduler (HOST OS Scheduler).
It is feasible.
You need to separate resources: memory, timers, IRQs, etc. So that, "Host" OS (FreeRTOS) don't even "know" about resources used by "Guest" OS (Nuttx).
For Cortex-A8 you may want to use IRQ for FreeRTOS and FIQ for GuestOS. It will let you not to rewrite IRQ controller (but again, make sure Host does not control FIQ after GuestOS started).
And some changes might be required for context switch: you need to differ Host-Host context switch, Host-Guest (and Guest-Host) and Guest-Guest context switch.
Though not direct answer to your question, address this problem at design level, do a separation of code that depends hardware (create API) and make the application level code independent of the underlying OS or runtime i.e rather depend on particular implementation let it depend on the API.
where ever needed port the hardware (OS) dependent code to the underlying OS/Runtime

Low interrupt latency via dedicated architectures and operating systems

This question may seem slightly vague, however I am researching upon how interrupt systems work and their latency times. I am trying to achieve an understanding of how architecture facilities such as FIQ in ARM help decrease latency times. How does this differ from using a operating system that does not have access or can not provide access to this facilities? For example - Windows RT is made for ARM etc, and this operating system is not able to be ported to other architectures.
Simply put - how is interrupt latency different in dedicated architectures that have dedicated operating systems as compared to operating systems that can be ported across many different architectures (Linux for example)?
Sorry for the rant - I'm pretty confused as you can probably tell.
I'll start with your Windows RT example, Windows RT is a port of Windows to the ARM architecture. It is not a 'dedicated operating system'. There are (probably) many OSes that only run on only 1 architecture, but that is more a function of can't be arsed to port them due to some reason.
What does 'port' really mean though?
Windows has a kernel (we'll call is NT here, doesn't matter) and that NT kernel has a bunch of concepts that need to be implemented. These concepts are things like timers, memory virtualisation, exceptions etc...
These concepts are implemented differently between architectures, so the port of the kernel and drivers (I will ignore the rest of the OS here, often that is a recompile only) will be a matter of using the available pieces of silicon to implement the required concepts. This implementation is a called 'port'.
Let's zoom in on interrupts (AKA exceptions) on an ARM that has FIQ and IRQ.
In general an interrupt can occur asynchronously, by that I mean at any time. The CPU is generally busy doing something when an IRQ is asserted so that context (we'll call it UserContext1) needs to be stored before the CPU can use any resources in use by UserContext1. Generally this means storing registers on the stack before using them.
On ARM when an IRQ occurs the CPU will switch to IRQ mode. Registers r13 and r14 have there own copy for IRQ mode, the rest will need to be saved if they are used - so that is what happens. Those stores to memory take some time. The IRQ is handled, UserContext1 is popped back off the stack then IRQ mode is exited.
So the latency in this case might be the time from IRQ assertion to the time the IRQ vector starts executing. That going to be some set number of clock cycles based upon what the CPU was doing when the IRQ happened.
The latency before the IRQ handling can occur is the time from the IRQ assert to the time the CPU has finished storing the context.
The latency before user mode code can execute depends on too much stuff in the OS/Kernel to explain here, but the minimum boils down to the time from the IRQ assertion to the return after restoring UserContext1 + the time for the OS context switch.
FIQ - If you are a hard as nails programmer you might only need to use 7 registers to completely handle your interrupt servicing. I mentioned that IRQ mode has its own copy of 2 registers, well FIQ mode has its own copy of 7 registers. Yup, that's 28 bytes of context that doesn't need to be pushed out into the stack (actually one of them is the link register so it's really 6 you have). That can remove the need to store UserContext1 then restore UserContext1. Thus the latency can be reduced by up to the length of time needed to do that save/restore.
None of this has much to do with the OS. The OS can choose to use or not use these features. The OS can choose to make guarantees regarding how long it will take to execute the OSes concept of an interrupt handler, or it may not. This is one of the basic concepts of an RTOS, the contract about how long before the handler will run.
The OS is designed for some purpose (and that purpose may be 'general') - that target design goal will have a lot more affect on latency than haw many target the OS has been ported to.
Go have a read about something like freertos than buy some hardware and try it. Annotate the code to figure out the latencies you really want to look at. IT will likely be the best way to get your ehad around it.
(*Multi-CPU systems do it the same with but with some synchronization and barrier functions and a sprinkling of complexity)

ARM TrustZone, Hypervisor: Hypervisor functionality WITHOUT Virtualization Extensions

I have found some interesting information regarding CPU virtualization for ARM and I'm wondering if you guys could help me understand more about it.
Basically, folks at some company called SierraWare have developed an ARM secure-mode OS called SierraTEE that (they say) virtualizes a guest OS like Linux/Android running in non-secure mode, needing only the Security Extensions. A piece of information from one of their presentation documents has caught my attention, specifically at page 19 of this PDF http://www.sierraware.com/sierraware_tee_hypervisor_overview.pdf they state:
Integrity checks for Rootkits and Kernel Hacks:
Monitor Syscall interrupt and interrupt handler. This will ensure that core syscalls are not tampered with.
By "Syscall interrupt" I understand SVC (=old SWI) instruction executions (correct me if I'm wrong), but by "monitoring" I'm not really sure because it could be real-time monitoring, from-time-to-time monitoring or on certain-events monitoring. In my mind they could monitor the SVC handler to prevent tampering-with by either:
Inspect SVC handler from time to time (timer interrupt for instance, since IRQs and FIQs can be routed to monitor mode) - PatchGuard-like approach, doesn't seem very useful to me
Inspect SVC handler on SVC instruction execution (=certain-events monitoring)
Trap SVC handlers memory region write-access (=real-time monitoring)
Regarding approach 2: would it be possible to trap non-secure SVC instruction executions from secure-mode?
Regarding approach 3: would it be possible to hook non-secure memory-region writes by using only the Security Extensions?
Thanks very much in advance
"Monitor" here may refer to the Monitor mode, the new mode added by the Security Extensions.
I'm not very familiar with the Security Extensions but I imagine it should be possible to mark specific memory regions as secure so any access to them will result in a Monitor mode trap, which can then handle the access and resume the non-secure code execution.
However, I just found this notice in the ARM ARM (B1.8.7 Summaries of asynchronous exception behavior):
In an implementation that includes the Security Extensions but does
not include the Virtualization Extensions, the following
configurations permit the Non-secure state to deny service to the
Secure state. Therefore, ARM recommends that, wherever possible, these
configurations are not used:
Setting SCR.IRQ to 1. With this configuration, Non-secure PL1 software can set CPSR.I to 1, denying the required routing of IRQs to
Monitor mode.
Setting SCR.FW to 1 when SCR.FIQ is set to 1. With this configuration, Non-secure PL1 software can set CPSR.F to 1, denying
the required routing of FIQs to Monitor mode.
The changes introduced by the Virtualization Extensions remove these
possible denials of service.
So it would seem it's not possible to achieve perfect virtualization with just Security Extensions.

How to determine if ARM processor running in a usual locked-down "world" or in Secore "world"?

For example, virt-what shows if you are running inside hardware virtualization "sandbox".
How to detect if you are running in ARM "TrustZone" sandbox?
TrustZone maybe different than what you think. There is a continuum of modes. From 'a simple API of trusted functions' to 'dual OSs' running in each world.
If there was more context given to the question, it would be helpful. Is this for programatically determining or for reverse engineering considerations? For the current Linux user-space, the answer is no.
Summary
No current user space utility.
Time based analysis.
Code based analysis.
CPU exclusion and SCR.
ID_PRF1 bits [7:4].
virt-what is not a fool-proof way of discovering if you are running under a hyper-visor. It is a program written for linux user-space. Mostly, these are shell scripts which examine /proc/cpuinfo, etc. procfs is a pseudo-file system which runs code in the kernel context and reports to user space. There is no such detection of TrustZone in the main line ARM linux. By design, ARM has made it difficult to detect. An design intent is to have code in the normal world run unmodified.
Code analysis
In order to talk to the secure world, the normal world needs SMC instructions. If your user space has access to kernel code or the vmlinux image, you can try to analyze the code sections for an SMC instruction. However, this code maybe present in the image, but never active. At least this says whether the Linux kernel has some support for TrustZone. You could write a kernel module which would trap any execution of an SMC instruction, but there are probably better solutions.
Timing analysis
If an OS is running in the secure world, some time analysis would show that some CPU cycles have been stolen if frequency scaling is not active. I think this is not an answer in the spirit of the original question. This relies on knowing that the secure world is a full-blown OS with a timer (or at least pre-emptible interrupts).
CPU exclusion and SCR
The SCR (Secure configuration register) is not available in the normal world. From the ARM Cortex-A5 MPcore manual (pg4-46),
Usage constraints The SCR is:
• only accessible in privileged modes
• only accessible in Secure state.
An attempt to access the SCR from any state other than secure privileged
results in an Undefined instruction exception.
ID_PRF1 bits [7:4].
On some Cortex-A series, the instruction,
mrc p15, 0, r0, c0, c1, 1
will get a value where bits [7:4] indicate whether the CPU supports Security Extensions, also known as TrustZone. A non-zero value indicates it is supported. Many early CPUs may not support this CP15 register . So, it is much like the SCR and handling the undefined instruction. Also, it doesn't tell you that code is active in the TrustZone mode.
Summary
It is possible that you could write a kernel module which would try this instruction and handle the undefined exception. This would detect a normal versus secure world. However, you would have to exclude CPUs which don't have TrustZone at all.
If the device is not an ARMv6 or better, then TrustZone is impossible. A great deal of Cortex-A devices have TrustZone in the CPU, but it is not active.
The combined SMC test and a CPU id, is still not sufficient. Some boot loaders run in the secure world and then transition to the normal world. So secure is only active during boot.
Theoretically, it is possible to know, especially with more knowledge of the system. There maybe many signs, such as spurious interrupts from the GIC, etc. However, I don't believe that any user space linux tool exists as of Jan 2014. This is a typical war of escalation between virus/rootkit writers and malware detection software.TZ Rootkits
You have not specified any details of the processor (A8, A9, A15?) or the execution mode (user/kernel/monitor) from where you want to detect the processor state.
As per the ARM documentation, the current state of the processor as Secure (aka the TrustZone sandbox) or Non-secure can be detected by reading the Secure Configuration Register and checking for the NS bit.
To access the Secure Configuration Register: MRC p15, 0, <Rd>, c1, c1, 0
Bit 0 being set corresponds to the processor being in non-secure mode and vice-versa.
You can check the processor's datasheet, and find those registers which behaves different between Normal world and Secure world. Generally, in Secure World, when you read these registers you will just get null. But get data in Normal world. And also, some registers that you can just access in Secure world, if you are in Secure World, you can access it, but in Normal World your access will be rejected.
Any way, there are many ways to distinguish Normal World and Secure World. JUST READ THE DATASHEET IN DETAIL.

Resources