Are there any memory layout docs for ARM64?

Are there any memory layout docs for ARM64? - arm

Is there any reference material for where various images should be loaded on ARM-A (aarch64) architectures? For instance, is there anything that says where in memory a bootloader should be placed, where the hypervisor should start, or where trustzone should begin?

The closest reference for this would be SBSA, however:
Note: Compliant software must not make any assumptions about the
memory map that might prejudice compliant hardware. For example, the
full physical address space must be supported. There must be no
dependence on memory or peripherals being located at certain physical
locations

Related

How does memory mapped I/O (MMIO) work on ARM architectures?

I would like to understand how the MMIO works on ARM architecture.
I realized that ARM provides 1:1 mapping from physical address to specific peripheral.
For example, to manage the GPIOX on arm, for example in Raspberry Pi, the processor accesses the specific physical addresses (seems that preconfigured by the manufacturer?) without configuring some registers beforehand.
I thought that there are some specific BAR register that arbitrates the read/write request to specific physical addresses to peripherals. However, when I check the spec for BCM2835 (Raspberry pi 3), the physical addresses are translated by another MMU called VC/ARM MMU. Is it a common design to have another MMU that translates the physical addresses to bus addresses in ARM architecture?
Also, I was wondering how the SMMU (IOMMU in x86) is utilized in this concept. I found one article mentioning that the VC/ARM MMU is an example of SMMU but I think that is not true? When I check some monitor code & kernel driver code implementation (not raspberry pi), it seems that the SMMU is also mapped to specific physical addresses and the monitor/kernel uses those addresses to initialize and communicate with SMMU. If the arm architecture utilize VC/ARM MMU as an SMMU, how that physical address mapping for SMMU itself can be accessed to initialize the SMMU..?
Lastly, I thought that all peripherals are managed by the SMMU. If some peripherals are always mapped to fixed physical addresses, what is the role of SMMU? Why some peripherals are communicated with PE (CPU) through fixed physical addresses, and some are communicated with SMMU..? How exactly the peripherals and PE (Processors) can communicate in ARM architecture..?

I am just answering your questions. There are many things wrong with some assumptions you have, which Old Timer tries to clarify.
Is it a common design to have another MMU that translates the physical addresses to bus addresses in ARM architecture?
No. Broadcom has an architecture license. They designed there own HDL code for the ARM CPU and system details can be different. In fact, it is quite common for even direct license (using ARM HDL) that the systems differ from vendor to vendor.
If the arm architecture utilize VC/ARM MMU as an SMMU, how that physical address mapping for SMMU itself can be accessed to initialize the SMMU?
If some peripherals are always mapped to fixed physical addresses, what is the role of SMMU? Why some peripherals are communicated with PE (CPU) through fixed physical addresses, and some are communicated with SMMU..? How exactly the peripherals and PE (Processors) can communicate in ARM architecture..?
The typical solution to this is 'TrustZone'. Here, the access is defined as having a 'secure' or 'normal' access. The master (CPU) tags the access and either the peripheral or a bus access controller permits or prohibits access to the peripheral. These are best statically mapped at boot time and locked.
By defining permitted use cases, the peripheral/master access patterns can be defined for the system. The complication is 'dynamic' peripherals and masters. The ARM TrustZone CPU is dynamic. A 'world switch' changes the CPU access and there are attacks on the communication interface between 'normal' and 'secure' worlds.
The ARM AXI/AHB buses were originally designed for embedded devices. PCs in contrast have dynamic buses ISA->EISA->PCMIA->PCI (etc.) These addresses are typically dynamic. Note However, this bus structure has an expense. So some of your question is like asking why isn't ARM just like an x86 PC. They had different goals and they are different. You can't put a new grahpics card into your cell phone.
Reference:
Handling ARM Trustzones
Explanation of arm Bus architecture
ARM Differences from PC
Note: It is from the modular nature of the PC system which was part of why the PC dominated Apple, Commodore, Atari, etc. in my opinion (the other aspect was piracy), contrary to others who think the answer is Microsoft. The hardware matters.

Microprocessor and Microcontroller Compilation Differences

I am trying to understand the differences in the C/C++ compilation process (Compiler/linker/locator etc..) between a microcontroller and a microprocessor.
For example, for a microcontroller, we can provide the linker script to specify the actual physical memory location the program should be executed. However, in a microprocessor where there are multiple programs running, we are unable to provide the actual addresses to load the program.
I would like to know how this compilation handles in a microprocessor and a microcontroller.
Thanks a lot!

I am trying to understand the differences in the C/C++ compilation process (Compiler/linker/locator etc..) between a microcontroller and a microprocessor.
None by itself.
The difference might be that on a microcontroller you typically don't have an operating system that supports any run-time loading of shared libraries, but that's not necessarily the case (NuttX and others).
For example, for a microcontroller, we can provide the linker script to specify the actual physical memory location the program should be executed.
You can do the same with a microprocessor.
You're trying to make a distinction where there is none: a microcontroller is just a microprocessor with an embedded target market and typically, integrated memory in-package. That's it.
However, in a microprocessor where there are multiple programs running,
this can (and doesn't have to) be the case on both microcontrollers and microprocessors.
You mean "on microprocessors, we typically use a multitasking operating system. Can we, on such an operating system..."
we are unable to provide the actual addresses to load the program.
That's not true. Often, such operating systems offer address space randomization and you can compile relocateable code – but the same can (and is!) done for microcontrollers.

The terms microcontroller and microprocessor have no reliable definition, they are used rather randomly and intermixed by different engineers and manufacturers. The consensus is that microcontrollers are simpler, have less resources and are meant more for "real-time-y" embedded tasks. Microprocessor are slightly more complex, have more resources and are meant more for "general" tasks. Various terms like MMU, embedded Flash/RAM, external Flash/RAM are thrown around. If this sounds vague - it is. Don't rely on those terms.
You need to look at specific features on a micro which enable your abilities as a software engineer. The most basic one is MMU - this defines if you can have virtual memory or not. This in turn defines whether it supports an OS which runs processes in separate memory regions or it's all one big continuous pile of memory with addressing hardwired (in which case you still get an OS but it has much less to do). Linking depends a lot on that distinction.
Then a system which runs processes in isolated memory regions typically needs to load the process code into RAM before executing it, which requires (much) more RAM, which is typically solved by an external RAM chip, which requires an MMU.
But the classic definition of a microcontroller is: no MMU, embedded Flash and/or RAM. Classic microprocessor is: MMU, external storage and RAM. There are more exceptions than rules.

How to access a physical address which is larger than 4G in Trusted firmware provided by Linaro in juno board

I am using Trusted Firmware Image(LSK and Android filesystem) provided by Linaro on Juno board r1. In my case, I just want to some trivial test in EL3, e.g., reading specific memory.
To make things easier, I didn't do anything until the system is completely booted. Then I load a kernel module which sends a SMC instruction and the SMC exception will be handled in BL3-1 by a custom handler. In the handler, I disabled MMU for EL3 and attempted to access a physical address directly. But I found that if the physical address is larger than 0xffffffff(4G), the content I got will be all 0. The physical address lower than 0xffffffff works perfectly. And If I mapped that physical address into a virtual address smaller than 0xffffffff(Linaro's EL3 only support virtual address lower than 0xffffffff), it also works.
So, why I can not get the correct content of a physical address larger than 0xffffffff after I disabled MMU in EL3?
Is there anyone know the details?
Thanks a lot for helping me!

The stock Juno firmware sets up the TrustZone controller such that 16MB of DRAM from 0x0ff000000 to 0x0ffffffff is reserved for the Secure world, and the rest is Non-Secure only. The specific TZC configuration is such that the Non-Secure regions just read as zero and ignore writes, rather than aborting, for Secure accesses.
Thus it's not that you can't access physical addresses above 4GB per se, it's just that the only thing that happens to be up there is DRAM, which is the thing the secure world has specifically denied itself access to the majority of.
The relevant part of the firmware source can be found here.

How does kernel restrict processes to their own memory pool?

This is purely academical question not related to any OS
We have x86 CPU and operating memory, this memory resembles some memory pool, that consist of addressable memory units that can be read or written to, using their address by MOV instruction of CPU (we can move memory from / to this memory pool).
Given that our program is the kernel, we have a full access to whole this memory pool. However if our program is not running directly on hardware, the kernel creates some "virtual" memory pool which lies somewhere inside the physical memory pool, our process consider it just as the physical memory pool and can write to it, read from it, or change its size usually by calling something like sbrk or brk (on Linux).
My question is, how is this virtual pool implemented? I know I can read whole linux source code and maybe one year I find it, but I can also ask here :)
I suppose that one of these 3 potential solutions is being used:
Interpret the instructions of program (very ineffective and unlikely): the kernel would just read the byte code of program and interpret each instruction individually, eg. if it saw a request to access memory the process isn't allowed to access it wouldn't let it.
Create some OS level API that would need to be used in order to read / write to memory and disallow access to raw memory, which is probably just as ineffective.
Hardware feature (probably best, but have no idea how that works): the kernel would say "dear CPU, now I will send you instructions from some unprivileged process, please restrict your instructions to memory area 0x00ABC023 - 0xDEADBEEF" the CPU wouldn't let the user process do anything wrong with the memory, except for that range approved by kernel.
The reason why am I asking, is to understand if there is any overhead in running program unprivileged behind the kernel (let's not consider overhead caused by multithreading implemented by kernel itself) or while running program natively on CPU (with no OS), as well as overhead in memory access caused by computer virtualization which probably uses similar technique.

You're on the right track when you mention a hardware feature. This is a feature known as protected mode and was introduced to x86 by Intel on the 80286 model. That evolved and changed over time, and currently x86 has 4 modes.
Processors start running in real mode and later a privileged software (ring0, your kernel for example) can switch between these modes.
The virtual addressing is implemented and enforced using the paging mechanism (How does x86 paging work?) supported by the processor.

On a normal system, memory protection is enforced at the MMU, or memory management unit, which is a hardware block that configurably maps virtual to physical addresses. Only the kernel is allowed to directly configure it, and operations which are illegal or go to unmapped pages raise exceptions to the kernel, which can then discipline the offending process or fetch the missing page from disk as appropriate.
A virtual machine typically uses CPU hardware features to trap and emulate privileged operations or those which would too literally interact with hardware state, while allowing ordinary operations to run directly and thus with moderate overall speed penalty. If those are unavailable, the whole thing must be emulated, which is indeed slow.

Read from any ram address directly?

Im doing some debugging on hardware with a Linux OS.
Now there no way for me to know if any of it works unless I can check the allocated ram that I asked it to write to.
Is there some way that I can check what is in that block or RAM from an external program running in the same OS?
If I could write a little program in C to do that how will I go about it since I cant just go and assign pointers custom addresses ?
Thanks

I think the best way to do what you are asking for is to use a debugger. And you cannot read another programme's memory unless you execute your code in a privileged space (i.e. the kernel), and privileged from the point of view of the CPU. And this because each programme is running in its own virtual memory space (for security concerns) and even the kernel is running in a virtual memory space but it has the privilege to map any physical memory block inside the virtual memory space it is current running. Anyway, I will not explain more in depth how an modern OS manage memory with the underneath hardware, it would be long.
You should really look at using a debugger. Once you environment with your debugger is ready, you should put a break after that memory block allocation so the debugger will stop the programme there and so you can inspect that freshly allocated memory block as you wish. Depending on whether you use an IDE or not, it can be very easy to use a debugger ;)

/dev/mem could come to use. It is a device file that is an image of the physical memory (including non-RAM memory). It's generally used to read/write memory of peripheral devices.
By mmap()ing to it, you could access physical memory.
See this linux documentation project page

memedit is a handy utility to display and change memory content for testing purposes.
It's main purpose is to display SoC hardware registers but it could be used to display RAM. It is based on mmap() mechanism. It could be good starting point to write custom application.