Which data bus is used after physical remap to RAM in STM32F4? - arm

STM32F4 controllers (with ARM Cortex M4 CPU) allow a so called physical remap of the lowest addresses in the memory space (0x00000000 to 0x03FFFFFF) using the SYSCFG_MEMRMP register. What I do understand is that the register selects which memory (FLASH/RAM/etc.) is aliased to the lowest addresses and therefore from which memory the reset vector and stack pointer is fetched after reset.
The documentation [1] also mentions that
In remap mode, the CPU can access the external memory via ICode bus
instead of System bus which boosts up the performance.
This means that after a remap e.g. to RAM an instruction fetched from within the alias address space (0x00000000 to 0x03FFFFFF) the ICode bus will be used.
Now my question: After such a remap operation e.g. to RAM, will a fetch to the non-aliased location of the RAM use the system bus or the ICode bus?
The background of the question is that I want to write a linker script for an image executing from RAM only (under control of a debugger). To which memory area should the .text section go? The alias space or the physical space?
[1] ST DocID018909 Rev 7
Thanks to Sean I could find the answer in the ARM® Cortex®‑M4 Processor Technical Reference Manual section 2.3.1 Bus interfaces:
ICode memory interface: Instruction fetches from Code memory space,
0x00000000 to 0x1FFFFFFC, are performed over the [sic!: this] 32-bit AHB-Lite bus.
DCode memory interface: Data and debug accesses to Code memory space,
0x00000000 to 0x1FFFFFFF, are performed over the [sic!: this] 32-bit AHB-Lite bus.
System interface: Instruction fetches and data and debug accesses to
address ranges 0x20000000 to 0xDFFFFFFF and 0xE0100000 to 0xFFFFFFFF
are performed over the [sic!: this] 32-bit AHB-Lite bus.
This also makes clear, that the flash memory of STM32F4 MCUs located at 0x08000000 is always accessed (by the CPU core) using the ICode/DCode busses, regardless if it is remapped. This is because both, the original location and the remapped location are within the code memory space (0x00000000 to 0x1FFFFFFF).
However, if the code is located in SRAM at 0x20000000 then access to the remapped location at 0x00000000 uses the ICode/DCode busses while access to the original location (outside the code memory space) uses the system bus.

The choice of bus interface on the core depends on the addresses accessed. If you access an instruction at 0x00000004, this is done on the ICode bus. An access to 0x20000004 is done using the System bus.
What the REMAP function does is change the physical memory system so that an access to 0x00000004 (ICode bus) will use the same RAM as you can also access on the system bus. Any access to 0x20000004 will be unaffected, and still be generated on the System bus by the core.

Related

Are memory mapped registers separate registers on the bus?

I will use the TM4C123 Arm Microcontroller/Board as an example.
Most of it's I/O registers are memory mapped so you can get/set their values using
regular memory load/store instructions.
My questions is, is there some type of register outside of cpu somewhere on the bus which is mapped to memory and you read/write to it using the memory region essentially having duplicate values one on the register and on memory, or the memory IS the register itself?
There are many buses even in an MCU. Bus after bus after bus splitting off like branches in a tree. (sometimes even merging unlike a tree).
It may predate the intel/motorola battle but certainly in that time frame you had segmented vs flat addressing and you had I/O mapped I/O vs memory mapped I/O, since motorola (and others) did not have a separate I/O bus (well one extra...address...signal).
Look at the arm architecture documents and the chip documentation (arm makes IP not chips). You have load and store instructions that operate on addresses. The documentation for the chip (and to some extent ARM provides rules for the cortex-m address space) provides a long list of addresses for things. As a programmer you simply line up the address you do loads and stores with and the right instructions.
Someones marketing may still carry about terms like memory mapped I/O, because intel x86 still exists (how????), some folks will continue to carry those terms. As a programmer, they are number one just bits that go into registers, and for single instructions here and there those bits are addresses. If you want to add adjectives to that, go for it.
If the address you are using based on the chip and core documentation is pointing at an sram, then that is a read or write of memory. If it is a flash address, then that is the flash. The uart, the uart. timer 5, then timer 5 control and status registers. Etc...
There are addresses in these mcus that point at two or three things, but not at the same time. (address 0x00000000 and some number of kbytes after that). But, again, not at the same time. And this overlap at least in many of these cortex-m mcus, these special address spaces are not going to overlap "memory" and peripherals (I/O). But instead places where you can boot the chip and run some code. With these cortex-ms I do not think you can even use the sort of mmu to mix these spaces. Yes definitely in full sized arms and other architectures you can use a fully blow mcu to mix up the address spaces and have a virtual address space that lands on a physical address space of a different type.

Confusion of virtual memory

Consider a sample below.
char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';
The 4KB memory is allocated by calling malloc(). OS handles the memory request by the user program in user-space. First, OS requests memory allocation to RAM, then RAM gives physical memory address to OS. Once OS receives physical address, OS maps the physical address to virtual address then OS returns the virtual address which is the address of p to user program.
I wrote some value(a and b) in virtual address and they are really written into main memory(RAM). I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory(RAM) even though I didn't care about them.
What happens in behind? What OS does for me? I couldn't found relevant materials in some books(OS, system programming).
Could you give some explanation? (Please omit the contents about cache for easier understanding)
A detailed answer to your question will be very long - and too long to fit here at StackOverflow.
Here is a very simplified answer to a little part of your question.
You write:
I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory
Seems you have a very fundamental misunderstanding here.
There is no memory directly "behind" a virtual address. Whenever you access a virtual address in your program, it is automatically translated to a physical address and the physical address is then used for access in main memory.
The translation happens in HW, i.e. inside the processor in a block called "MMU - Memory management unit" (see https://en.wikipedia.org/wiki/Memory_management_unit).
The MMU holds a small but very fast look-up table that tells how a virtual address is to be translated into a physical address. The OS configures this table but after that, the translation happens without any SW being involved and - just to repeat - it happens whenever you access a virtual memory address.
The MMU also takes some kind of process ID as input in order to do the translation. This is need because two different processes may use the same virtual address but they will need translation to two different physical addresses.
As mentioned above the MMU look-up table (TLB) is small so the MMU can't hold a all translations for a complete system. When the MMU can't do a translation, it can make an exception of some kind so that some OS software can be triggered. The OS will then re-program the MMU so that the missing translation gets into the MMU and the process execution can continue. Note: Some processors can do this in HW, i.e. without involving the OS.
You have to understand that virtual memory is virtual, and it can be more extensive than physical memory RAM, so it is mapped differently. Although they are actually the same.
Your programs use virtual memory addresses, and it is your OS who decides to save in RAM. If it fills up, then it will use some space on the hard drive to continue working.
But the hard drive is slower than the RAM, that's why your OS uses an algorithm, which could be Round-Robin, to exchange pages of memory between the hard drive and RAM, depending on the work being done, ensuring that the data that are most likely to be used are in fast memory. To swap pages back and forth, the OS does not need to modify virtual memory addresses.
Summary overlooking a lot of things
You want to understand how virtual memory works. There's lots of online resources about this, here's one I found that seems to do a fair job of trying to explain it without getting too crazy in technical details, but also doesn't gloss over important terms.
https://searchstorage.techtarget.com/definition/virtual-memory
For Linux on x86 platforms, the assembly equivalent of asking for memory is basically a call into the kernel using int 0x80 with some parameters for the call set into some registers. The interrupt is set at boot by the OS to be able to answer for the request. It is set in the IDT.
An IDT descriptor for 32 bits systems looks like:
struct IDTDescr {
uint16_t offset_1; // offset bits 0..15
uint16_t selector; // a code segment selector in GDT or LDT
uint8_t zero; // unused, set to 0
uint8_t type_attr; // type and attributes, see below
uint16_t offset_2; // offset bits 16..31
};
The offset is the address of the entry point of the handler for that interrupt. So interrupt 0x80 has an entry in the IDT. This entry points to an address for the handler(also called ISR). When you call malloc(), the compiler will compile this code to a system call. The system call returns in some register the address of the allocated memory. I'm pretty sure as well that this system call will actually use the sysenter x86 instruction to switch into kernel mode. This instruction is used alongside an MSR register to securely jump into kernel mode from user mode at the address specified in the MSR (Model Specific Register).
Once in kernel mode, all instructions can be executed and access to all hardware is unlocked. To provide with the request the OS doesn't "ask RAM for memory". RAM isn't aware of what memory the OS uses. RAM just blindly answers to asserted pins on it's DIMM and stores information. The OS just checks at boot using the ACPI tables that were built by the BIOS to determine how much RAM there is and what are the different devices that are connected to the computer to avoid writing to some MMIO (Memory Mapped IO). Once the OS knows how much RAM is available (and what parts are usable), it will use algorithms to determine what parts of available RAM every process should get.
When you compile C code, the compiler (and linker) will determine the address of everything right at compilation time. When you launch that executable the OS is aware of all memory the process will use. So it will set up the page tables for that process accordingly. When you ask for memory dynamically using malloc(), the OS determines what part of physical memory your process should get and changes (during runtime) the page tables accordingly.
As to paging itself, you can always read some articles. A short version is the 32 bits paging. In 32 bits paging you have a CR3 register for each CPU core. This register contains the physical address of the bottom of the Page Global Directory. The PGD contains the physical addresses of the bottom of several Page Tables which themselves contain the physical addresses of the bottom of several physical pages (https://wiki.osdev.org/Paging). A virtual address is split into 3 parts. The 12 bits to the right (LSB) are the offset in the physical page. The 10 bits in the middle are the offset in the page table and the 10 MSB are the offset in the PGD.
So when you write
char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';
you create a pointer of type char* and making a system call to ask for 4096 bytes of memory. The OS puts the first address of that chunk of memory into a certain conventional register (which depends on the system and OS). You should not forget that the C language is just a convention. It is up to the operating system to implement that convention by writing a compatible compiler. It means that the compiler knows what register and what interrupt number to use (for the system call) because it was specifically written for that OS. The compiler will thus take the address stored into this certain register and store it into this pointer of type char* during runtime. On the second line you are telling the compiler that you want to take the char at the first address and make it an 'a'. On the third line you make the second char a 'b'. In the end, you could write an equivalent:
char* p = (char*)malloc(4096);
*p = 'a';
*(p + 1) = 'b';
The p is a variable containing an address. The + operation on a pointer increments this address by the size of what is stored in that pointer. In this case, the pointer points to a char so the + operation increments the pointer by one char (one byte). If it was pointing to an int then it would be incremented of 4 bytes (32 bits). The size of the actual pointer depends on the system. If you have a 32 bits system then the pointer is 32 bits wide (because it contains an address). On a 64 bits system the pointer is 64 bits wide. A static memory equivalent of what you did is
char p[4096];
p[0] = 'a';
p[1] = 'b';
Now the compiler will know at compile time what memory this table will get. It is static memory. Even then, p represents a pointer to the first char of that array. It means you could write
char p[4096];
*p = 'a';
*(p + 1) = 'b';
It would have the same result.
First, OS requests memory allocation to RAM,…
The OS does not have to request memory. It has access to all of memory the moment it boots. It keeps its own database of which parts of that memory are in use for what purposes. When it wants to provide memory for a user process, it uses its own database to find some memory that is available (or does things to stop using memory for other purposes and then make it available). Once it chooses the memory to use, it updates its database to record that it is in use.
… then RAM gives physical memory address to OS.
RAM does not give addresses to the OS except that, when starting, the OS may have to interrogate the hardware to see what physical memory is available in the system.
Once OS receives physical address, OS maps the physical address to virtual address…
Virtual memory mapping is usually described as mapping virtual addresses to physical addresses. The OS has a database of the virtual memory addresses in the user process, and it has a database of physical memory. When it is fulfilling a request from the process to provide virtual memory and it decides to back that virtual memory with physical memory, the OS will inform the hardware of what mapping it choose. This depends on the hardware, but a typical method is that the OS updates some page table entries that describe what virtual addresses get translated to what physical addresses.
I wrote some value(a and b) in virtual address and they are really written into main memory(RAM).
When your process writes to virtual memory that is mapped to physical memory, the processor will take the virtual memory address, look up the mapping information in the page table entries or other database, and replace the virtual memory address with a physical memory address. Then it will write the data to that physical memory.

Mapping Reserved High Memory to User Space via remap_pfn_range

Arch=x86_64
I am working through a DMA solution following the process outlined in this question,
Direct Memory Access in Linux
My call to ioremap successfully returns with an address, pt.
In my call to remap_pfn_range I use, virt_to_phys(pt) >> PAGE_SHIFT, to specify the pfn of the area generated by the ioremap call.
When the userspace application using mmap executes and the call to remap_pfn_range is made, the machine crashes. I assume the mapping is off and I am forcing the system to use memory that is already allocated (screen glitches before exit), however I'm not clear on where the mismatch is occurring. The system has 4 Gigs of Ram and I reserved 2Gigs by using the kernel boot option mem=2048M.
I use BUFFER_SIZE=1024u*1024u*1024u and BUFFER_OFFSET=2u*1024u*1024u*1024u.
putting these into pt=ioremap(BUFFER_SIZE,BUFFER_OFFSET) I believe pt should equal a virtual address to the physical memory located at the 2GB boundary up to the 3GB boundary. Is this assumption accurate?
When I execute my kernel module, but I change my remap_pfn_range to use vma->vm_pgoff>>PAGE_SHIFT as the target pfn the code executes with no error and I can read and write to the memory. However this is not using the reserved physical memory that I intended.
Since everything works when using vma->vm_pgoff>>PAGE_SHIFT I believe my culprit is between my ioremap and the remap_pfn_range
Thanks for any suggestions!
The motivation behind the use of this kernel module is the need for large contiguous buffers for DMA from a PCI device. In this application, recompiling the kernel isn't an option so I'm trying to accomplish it with a module + hardware.
My call to ioremap successfully returns with an address, pt.
In my call to remap_pfn_range I use, virt_to_phys(pt) >> PAGE_SHIFT,
to specify the pfn of the area generated by the ioremap call.
This is illegal, because ioremap reserves virtual region in vmalloc area. The virt_to_phys() is OK only for linearly mapped part of memory.
putting these into pt=ioremap(BUFFER_SIZE,BUFFER_OFFSET) I believe pt
should equal a virtual address to the physical memory located at the
2GB boundary up to the 3GB boundary. Is this assumption accurate?
That is not exactly true, for example on my machine
cat /proc/iomem
...
00001000-0009ebff : System RAM
...
00100000-1fffffff : System RAM
...
There may be several memory banks, and the memory not obligatory will start at address 0x0 of physical address space.
This might be usefull for you Dynamic DMA mapping Guide

Linux process virtual address space's address range

I'm on 32bit machine. From what I understand, User space's address ranges from 0x00000000 to 0xbfffffff, and kernel's ranges from 0xc0000000 to 0xffffffff.
But when I used pmap to see a process's memory allocation, I see that the library are loaded in around 0xf7777777. Please see the attached screenshot. Does it mean those libraries are loaded in kernel space? And when I used mmap(), I got the address from 0xe0000000. So, mmap() got memory from kernel space?
I'm on 32bit machine. From what I understand, User space's address
ranges from 0x00000000 to 0xbfffffff, and kernel's ranges from
0xc0000000 to 0xffffffff.
Not exactly. Kernel memory space starts at 0xC0000000, but it doesn't have to fill the entire GB. In fact, it fills up to virtual address 0xF7FFFFFF. This covers 896MB of physical memory. Virtual addresses 0xF8000000 and up are used as a 128MB window for the kernel to map any region of physical memory beyond the 896MB limit.
All user processes share the same memory map for virtual addresses 0xC0000000 and beyond, so if the kernel does not use its entire GB of virtual space, it may reuse part of it to map commonly used shared libraries, so every process can see them.

ARM CM3 STM32F207 SRAM (RAM) space increase with SRAM2 ? BKPSRAM?

increase SRAM space on STM32F207
Hi,
I use the STM32F207ZFT but I haven't enough RAM space in SRAM1 for my application:
Question1: if I don't use the SRAM2 region for the DMA's use (SRAM2 / 16KB: 0x2001 C000 - 0x2001 FFFF), could I use this memory area for normal RAM purpose (for extend the BSS area) in order to be contiguous with the SRAM1 (to increase the overall RAM size for uninitialized variables, initialized to 0)?
Question2: could I use the backup SRAM (BKPSRAM / 4KB: 0x4002 4000 - 0x4002 4FFF) for storing some data buffers or some data arrays, as we could do it by using the BSS RAM area?
Independently of its low consumption (on Vbat pin), is that the characteristics of this BKPSRAM are comparable to the SRAM1 area (access time ...)?
Best regards,
Disclaimer: I'm very familiar with STM32F1xx and somewhat familiar with STM32F4xx, but never used STM32F2xx.
Regarding the first question: from reading the manual (in particular, section 2.3.1), there's nothing special about SRAM2, except that an address in SRAM2 can be accessed at the same time as an address in SRAM1 is being accessed. Other than that, I can't spot any restrictions.
Regarding the second question: BKPSRAM is connected to the bus matrix through the AHB1 bus. In principle this bus runs at the same clock as the core, so if there is no bus contention the speed should be similar. If there are any wait states or anything that could delay accesses to BKPSRAM, I was unable to find it on the manual. Of course, if you have, say, a DMA transaction that is heavily accessing a peripheral connected to AHB1, then you would have bus contention which might delay accesses to BKPSRAM, whereas SRAM1 and SRAM2 have a direct connection, not shared by anything else, to the bus matrix. To summarize: you shouldn't have any problems.

Resources