In ARM architecture, what would happen if a memory access is inside and outside a region of a MPU.
Let say I want a 32bit memory access using LDR to an unaligned location: 0x000a03e.
If I have a MPU region starting at 0xa000 and ending at 0xa003f.
The access to 0x000a03e will have 0x000a03e and 0x000a03f bytes in the region and 0x000a040-0x000a041 outside the region (Hitting the background region).
Will the memory controller issue 2 reads: One 16bit read starting from 0x000a03e and one other starting from 0x000a040 ?
If one of them is invalid (data abort,...), will the overall LDR fail ?
Related
I am reading the book, Professional CUDA C Programming. On page 159, it says:
Aligned memory accesses occur when the first address of a device
memory transaction is an even multiple of the cache granularity being
used to service the transaction (either 32 bytes for L2 cache or 128
bytes for L1 cache).
I am wondering why aligned memory accesses in CUDA need even multiples of the cache granularity rather than just multiples of the cache granularity.
So, I checked the cuda-c-programming-guide from NVDIA. It says:
Global memory resides in device memory and device memory is accessed
via 32-, 64-, or 128-byte memory transactions. These memory
transactions must be naturally aligned: Only the 32-, 64-, or 128-byte
segments of device memory that are aligned to their size (i.e., whose
first address is a multiple of their size) can be read or written by
memory transactions.
It seems that even multiples of the cache granularity is unnecessary for aligned memory access, isn't it?
The quoted sentence from the book seems to be incorrect in two senses:
A memory access has an alignment of N if it is an access to an address that is a multiple of N. That's irrespective of CUDA. What seems to be discussed here is memory access coalescence.
As you suggest, and AFAIK, coalescence requires "multiples of" the cache granularity, not "even multiples of".
STM32F4 controllers (with ARM Cortex M4 CPU) allow a so called physical remap of the lowest addresses in the memory space (0x00000000 to 0x03FFFFFF) using the SYSCFG_MEMRMP register. What I do understand is that the register selects which memory (FLASH/RAM/etc.) is aliased to the lowest addresses and therefore from which memory the reset vector and stack pointer is fetched after reset.
The documentation [1] also mentions that
In remap mode, the CPU can access the external memory via ICode bus
instead of System bus which boosts up the performance.
This means that after a remap e.g. to RAM an instruction fetched from within the alias address space (0x00000000 to 0x03FFFFFF) the ICode bus will be used.
Now my question: After such a remap operation e.g. to RAM, will a fetch to the non-aliased location of the RAM use the system bus or the ICode bus?
The background of the question is that I want to write a linker script for an image executing from RAM only (under control of a debugger). To which memory area should the .text section go? The alias space or the physical space?
[1] ST DocID018909 Rev 7
Thanks to Sean I could find the answer in the ARM® Cortex®‑M4 Processor Technical Reference Manual section 2.3.1 Bus interfaces:
ICode memory interface: Instruction fetches from Code memory space,
0x00000000 to 0x1FFFFFFC, are performed over the [sic!: this] 32-bit AHB-Lite bus.
DCode memory interface: Data and debug accesses to Code memory space,
0x00000000 to 0x1FFFFFFF, are performed over the [sic!: this] 32-bit AHB-Lite bus.
System interface: Instruction fetches and data and debug accesses to
address ranges 0x20000000 to 0xDFFFFFFF and 0xE0100000 to 0xFFFFFFFF
are performed over the [sic!: this] 32-bit AHB-Lite bus.
This also makes clear, that the flash memory of STM32F4 MCUs located at 0x08000000 is always accessed (by the CPU core) using the ICode/DCode busses, regardless if it is remapped. This is because both, the original location and the remapped location are within the code memory space (0x00000000 to 0x1FFFFFFF).
However, if the code is located in SRAM at 0x20000000 then access to the remapped location at 0x00000000 uses the ICode/DCode busses while access to the original location (outside the code memory space) uses the system bus.
The choice of bus interface on the core depends on the addresses accessed. If you access an instruction at 0x00000004, this is done on the ICode bus. An access to 0x20000004 is done using the System bus.
What the REMAP function does is change the physical memory system so that an access to 0x00000004 (ICode bus) will use the same RAM as you can also access on the system bus. Any access to 0x20000004 will be unaffected, and still be generated on the System bus by the core.
Arch=x86_64
I am working through a DMA solution following the process outlined in this question,
Direct Memory Access in Linux
My call to ioremap successfully returns with an address, pt.
In my call to remap_pfn_range I use, virt_to_phys(pt) >> PAGE_SHIFT, to specify the pfn of the area generated by the ioremap call.
When the userspace application using mmap executes and the call to remap_pfn_range is made, the machine crashes. I assume the mapping is off and I am forcing the system to use memory that is already allocated (screen glitches before exit), however I'm not clear on where the mismatch is occurring. The system has 4 Gigs of Ram and I reserved 2Gigs by using the kernel boot option mem=2048M.
I use BUFFER_SIZE=1024u*1024u*1024u and BUFFER_OFFSET=2u*1024u*1024u*1024u.
putting these into pt=ioremap(BUFFER_SIZE,BUFFER_OFFSET) I believe pt should equal a virtual address to the physical memory located at the 2GB boundary up to the 3GB boundary. Is this assumption accurate?
When I execute my kernel module, but I change my remap_pfn_range to use vma->vm_pgoff>>PAGE_SHIFT as the target pfn the code executes with no error and I can read and write to the memory. However this is not using the reserved physical memory that I intended.
Since everything works when using vma->vm_pgoff>>PAGE_SHIFT I believe my culprit is between my ioremap and the remap_pfn_range
Thanks for any suggestions!
The motivation behind the use of this kernel module is the need for large contiguous buffers for DMA from a PCI device. In this application, recompiling the kernel isn't an option so I'm trying to accomplish it with a module + hardware.
My call to ioremap successfully returns with an address, pt.
In my call to remap_pfn_range I use, virt_to_phys(pt) >> PAGE_SHIFT,
to specify the pfn of the area generated by the ioremap call.
This is illegal, because ioremap reserves virtual region in vmalloc area. The virt_to_phys() is OK only for linearly mapped part of memory.
putting these into pt=ioremap(BUFFER_SIZE,BUFFER_OFFSET) I believe pt
should equal a virtual address to the physical memory located at the
2GB boundary up to the 3GB boundary. Is this assumption accurate?
That is not exactly true, for example on my machine
cat /proc/iomem
...
00001000-0009ebff : System RAM
...
00100000-1fffffff : System RAM
...
There may be several memory banks, and the memory not obligatory will start at address 0x0 of physical address space.
This might be usefull for you Dynamic DMA mapping Guide
I am working on an embedded application.
Trying to configure the DMA controller I came across to the statement:
You must provide an area
of system memory to contain the channel control data structure
What do they mean saying system memory? Data SRAM? Code SRAM? Somewhere else?
You can find that statement at the beginning of section 8.4.3 here:
http://www.silabs.com/support%20documents/technicaldocs/efm32wg-rm.pdf
My understanding of it as I read it, is System Memory is allocated from the on chip RAM Memory ( Fig 3.1) not external memory mapped or flash program memory.
You see in the Table 3.2 it show the chips have Flash and internal RAM. I'd say the System Memory is the RAM, not the Flash memory.
Section 3.2 says the features of the chip is that is has 32KB of RAM on chip.
Check figure 5.2 for System Address Space. You'll need to allocate your
DMA_CTRLBASE and DMA_ALTCTRLBASE register within that region, 0x20000000 - 0x20007fff appropriately to fit the control structure. Hope that helps.
DMA measn Direct Memory Access. The idea is that a module like the ADC is not raising an iterrupt to ask the CPU to read its new data. Instead the module is directly writting on the memory of the uController. This saves performes time for the CPU. For configuration you must must create a heap structure and passing the pointer to the module. This allows the module directly change the memory content. So its your heap which is part of your SRAM.
I'm writing a kernel from scratch and am confused about what will happen once I initialize paging and map my kernel to a different location in virtual memory. My kernel is loaded to physical address 0x100000 on startup but I plan on mapping it to the virtual address 0xC0100000 so I can leave virtual address 0x100000 available for VM86 processes (more specifically, I plan on mapping physical addresses 0x100000 through 0x40000000 to virtual addresses 0xC0100000 through 0xFFFFF000). Anyway, I have a bitmap to keep track of my page frames located at physical address 0x108000 with the address stored in a uint32_t pointer. My question is, what will happen to this pointer when I initialize paging? Will it still point to my bitmap located at physical address 0x108000 or will it point to whatever the virtual address 0x108000 is mapped to in my page table? If the latter is true, how do I get around the problem that my pointers will not be correct once paging is enabled? Will I have to update my pointers or am I going about this completely wrong?