ARM - Memory map leakages

ARM - Memory map leakages - arm

Lets assume that we are using MCU with ARM Cortex-M4, 256KB of FLASH and 64KB of RAM. This CPU contains memory map like showed below:
As I understand it correctly, the memory map tells us what are the maximum sizes of memories, that limits MCU vendor and where that CPU will look for it. For example, we cannot use Cortex-M4 with FLASH memory above 512MB, right?
In that situation, we have 64KB of RAM, and the limit is 512MB. My question is - does CPU know about that? Does it have any safety mechanisms, that will avoid trying to access beyond that 64KB of RAM (stack overflow) by halting in any way? Or maybe the CPU will work in way like "I have that boundaries, I will move around these if necessary". I know, that compilers may provide some information, that can aware the programmer.

As I understand it correctly, the memory map tells us what are the maximum sizes of memories, that limits MCU vendor and where that CPU will look for it.
Yes.
For example, we cannot use Cortex-M4 with FLASH memory above 512MB, right?
Normally the flash is the part between address 0x0 and 0x1FFFFFFF. Meaning 512MB indeed (1024*1024*512=0x20000000). Which is a ridiculously large size for a Cortex M.
My question is - does CPU know about that?
Yes and no. The physical memory will exist where the vendor placed it. This can at some extent be remapped through the linker script.
The Cortex M does not have an advanced MMU/MPU with support virtual memory, meaning all memory is physical addresses. It does however keep track of various invalid accesses through hardware exceptions. From ARM/Keil AN209 Using Cortex-M3/M4/M7 Fault Exceptions:
Fault exception handlers
Fault exceptions trap illegal memory accesses and illegal program behavior. The following conditions are detected by fault exception handlers:
HardFault: is the default exception and can be triggered because of an error during exception processing, or because an exception cannot be managed by any other exception mechanism.
MemManage: detects memory access violations to regions that are defined in the Memory Management Unit (MPU); for example, code execution from a memory region with read/write access only.
BusFault: detects memory access errors on instruction fetch, data read/write, interrupt vector fetch, and register stacking (save/restore) on interrupt (entry/exit).
UsageFault: detects execution of undefined instructions, unaligned memory access for load/store multiple. When enabled, divide-by-zero and other unaligned memory accesses are detected.

No the CPU does not know - you specify the memory map in the linker script, and the link will fail if your code and/or data cannot be located in the stated available memory.
If you specify the memory map incorrectly, the linker may locate code/data in non-existent memory and when you load it, parts will be missing. For the flash programming very likely the programming tool will fail if it is set to read-back verify the code.
Also if you dynamically load code to non existent memory, or access memory not allocated by the linker at run-time, the results are non-deterministic, other than it won't do anything useful.

The CPU cannot know as everyone has said. The MCU vendor buys the processor ip from arm, as well as ip from other vendors as well as creates some of their own if nothing else the glue that holds the modules together. The flash itself is likely from some third party.
Some chip designers wrap around, this is not uncommon in hardware or software, for example the part may have 16Kbytes starting at 0x08000000 this is the CHIP companies decision ARM has little to do with it other than what you have found that they define wide ranges (likely for caching and other options within their domain). 16K is 16384 bytes or 0x4000 so 14 bits of address. There is likely an address decoder that sees some number of upper bits 0x08...and sends that request to the flash logic, then at the flash logic it would not suprise me to see the lower 14 address bits stripped off and used meaning if you were to address 0x08000000 and 0x08008000 you may get the same 0x0000 offset/address in the flash.
Some engineers may choose to look at those upper bits and declare a fault.
You have to examine this on a case by case basis not just an stm32 for example but each family of stm32, for every datasheet basically. (And there is no reason to expect this level of detail is documented by the chip vendor).
The arm cortex-m as with all processors are very very stupid they do what the bits tell them to do it is our responsibility to feed the a sequential trail of working instructions, just like laying track in front of a train you can lay a lot of track in the wrong place, with gaps, etc. If not per the rules of the train then the train will crash or fail in some way.
The others have mentioned the linker script, and to be clear the linker script does not just magically somehow know what chip you have, ultimately you, the programmer are responsible for telling the toolchain to build programs that follow the rules of the cpu AND CHIP, to be successful. So the right architecture instructions (or a subset, cortex-m0 instructions (armv6m will run on a cortex-m4 (armv7m)). And the linker script needs to define addresses for read only and read write areas that match the chip (not the core, the chip as they are in charge of that definition). And then barring 100 other ways you can fail. It will run.
You are ultimately responsible but most folks grab an sdk or sandbox of some sort and hope for the best, blind faith in others. Gnu and llvm tools are fully capable to be used by you directly without these third parties, but then you are fully responsible for getting everything right.

Related

I want to understand the syntax of 'attribute((space(dma)));'

I have to write a Stub for:
extern ECAN1MSGBUF ecan1msgBuf __attribute__((space(dma)));
Can someone explain to me what makes this call, how it works and how I can write / use a stub for a test program? I have the hardware not at home and must write a test, but the XCode announces as a warning: unknown attributes space ignored. Otherwise I work on the MPLabX compiler / debugger with access to the hardware. There is not the problem, of course.

DMA space on dspics is dual ported RAM that can be accessed without competing for memory bandwidth with the ALU (the actual CPU).
However, in some dspicE's (*) , DMA space is beyond the 32kb mark which needs EDS addressing. If so, you might want to view the sample code I posted about dspice CAN at http://www.microchip.com/forums/m790729.aspx#792226
Note that you can also use non dma space memory, the dma space memory is just more optimal.
(*) the ones with 56k memory, which are generally the 512KB flash parts for the GP and MU series.

Does a compiler have consider the kernel memory space when laying out memory?

I'm trying to reconcile a few concepts.
I know of virtual memory is shared (mapped) between the kernel and all user processes, which I read here. I also know that when the compiler generates addresses for code + data, the kernel must load them at the correct virtual addresses for that process.
To constrain the scope of the question, I'll just mean gcc when I mention 'the compiler'.
So does the compiler need to be compliant each new release of an OS, to know not to place code or data at the high memory addresses reserved for the kernel? As in, someone writing that piece of the compiler must know those details of how the kernel plans to load the program (lest the compiler put executable code in high memory)?
Or am I confusing different concepts? I got a bit confused when going through this tutorial, especially at the very bottom where it has OS code in low memory addresses, because I thought Linux uses high memory for the kernel.

The compiler doesn't determine the address ranges in memory at which things are placed. That's handled by the OS.
When the program is first executed, the loader places the various portions of the program and its libraries in memory. For memory that's allocated dynamically, large chunks are allocated from the OS and then sometimes divided into smaller chunks.
The OS loader knows where to load things. And the OS's virtual memory allocation logic how to find safe, empty spaces in the address space the process uses.
I'm not sure what you mean by the "high memory addresses reserved for the kernel". If you're talking about a 2G/2G or 3G/1G split on a 32-bit operating system, that is a fundamental design element of those OSes that use it. It doesn't change with versions.
If you're talking about high physical memory, then no. Compilers don't care about physical memory.

Linux gives each application its own memory space, distinct from the kernel. The page table contains the translations between this memory space and physical RAM, and the kernel sets up the page table so there's no interference.
That said, the compiler usually doesn't even care where the program is loaded in memory. Why would it?

ARM bare-metal with MMU: write to non-cachable,non-bufferable mapped area fail

I am ARM Cortex A9 CPU with 2 cores. But I just use 1 core and the other is just in a busy loop. I setup the MMU table using section (1MB per entry) like this:
0x00000000-0x14ffffff => 0x00000000-0x14ffffff (non-cachable, non-bufferable)
0x15000000-0x24ffffff => 0x15000000-0x24ffffff (cachable, bufferable)
0x25000000-0x94ffffff => 0x25000000-0x94ffffff (non-cachable, non-bufferable)
0x15000000-0x24ffffff => 0x95000000-0xa4ffffff (non-cachable, non-bufferable)
0xa5000000-0xffffffff => 0xa5000000-0xffffffff (non-cachable, non-bufferable)
It is rather simple. I just want to have a mirror of 256MB memory for non-cachable access. However, when I do several write to the the non-cachable memory section at 0x95000000-0xa4ffffff. I find the write is not actually written until I explicitly give a cache flush.
Am I doing something wrong or this kind of mapping is not valid? If that is the case, I don't understand how Linux's ioremap will be working on ARM. It will be good if anyone can give some explanation to me. Thanks very much.

First of all: the Cortex-A9 is an ARMv7-A processor. The terms non-cacheable/non-bufferable/cacheable/bufferable are no longer correct descriptions of the mappings.
The actual mapping type is determined by TEX[2:0], C and B bits.
So I am actually having to guess a bit here as to what your mappings actually are.
And my guess is that you have the majority of your mappings set as Strongly-ordered, and the mirrored region as Normal Write-Back cacheable.
Having multiple virtual mappings with different memory types pointing to the same physical location is generally not a good idea in the ARM architecture. It used to be explicitly banned, but the latest version of the ARMv7-AR Architecture reference manual (DDI 0406C.b) has a (fairly long) section dedicated to the implications of "Mismatched memory attributes".
I would recommend finding a different way of achieving your goal.
Simply changing the mapping of the uncached regions to Normal Non-cacheable would be a good start. There is no valid reason for using Strongly-ordered mappings for RAM.

External Memory(ies) Usage for Max Speed

I haven't used external memory OR ARM core micros before; all of the micros I've used have had internal FLASH and separate data/program address spaces. So forgive me if these questions are very basic, but I could use a "sanity check" to make sure I'm not missing something important:
I have an existing program that when compiled for one micro has the following memory table (IAR for Cortex M3):
40 620 bytes of readonly code memory
1 215 bytes of readonly data memory
126 900 bytes of readwrite data memory
I am moving to a micro that has NO internal FLASH and 128kB of internal SRAM, as it has a very high processor speed that I need. My plan is to use external NOR FLASH (let's say 512kB for the sake of argument) and at least one DDR2 external RAM (again, assume 512kB+ for sake of argument).
I'd like to copy the contents of the external FLASH into internal SRAM at boot-up (a bootloader is provided in a separate internal FLASH space), and execute code out of SRAM. What I'm still not clear on is if the 128kB of internal SRAM is sufficient to allow for this. Can I simply use the DDR2 external RAM to house all "data", and execute the program code out of SRAM? Is there a speed compromise this way? Speed is my #1 priority in this application. Is there a way to do this that will result in quicker execution?
Thanks

Depending on how the arm is connected to those memories (flash, sram, dram) then you would be able to use them however you like. run what you can from sram have other stuff in dram. dram in general is much slower than sram, but you might have a cache to help and other factors that may make one or the other memory worse/better. if using gnu tools you can certainly craft a linker script and bootloader that takes the various segments .text, .data, .rodata, etc and sets them up wherever you want them (sram, dram, specific places in each, etc), and then let the arm go have at it...For IAR I dont know much about that but no doubt they have some mechanism as well for doing the same thing.

How to use external memory on a microcontroller

In the past, I've worked a lot with 8 bit AVR's and MSP430's where both the RAM and flash were stored on the chip directly. When you compile and download your program, it sort of "just works" and you don't need to worry about where and how variables are actually stored.
Now I'm starting a project where I'd like to be able to add some external memory to a microcontroller (a TI Stellaris LM3S9D92 if that matters) but I'm not entirely sure how you get your code to use the external RAM. I can see how you configure the external bus pretty much like any other peripheral but what confuses me is how the processor keeps track of when to talk to the external memory and when to talk to the internal one.
From what I can tell, the external RAM is mapped to the same address space as the internal SRAM (internal starts at 0x20000000 and external starts at 0x60000000). Does that mean if I wrote something like this:
int* x= 0x20000000;
int* y= 0x60000000;
Would x and y would point to the first 4 bytes (assuming 32 bit ints) of internal and external RAM respectively? If so, what if I did something like this:
int x[999999999999]; //some super big array that uses all the internal ram
int y[999999999999]; //this would have to be in external ram or it wouldn't fit
I imagine that I'd need to tell something about the boundaries of where each type of memory is or do I have it all wrong and the hardware figures it out on its own? Do linker scripts deal with this? I know they have something to do with memory mapping but I don't know what exactly. After reading about how to set up an ARM cross compiler I get the feeling that something like winavr (avr-gcc) was doing a lot of stuff like this for me behind the scenes so I wouldn't have to deal with it.
Sorry for rambling a bit but I'd really appreciate it if someone could tell me if I'm on the right track with this stuff.
Update
For any future readers I found this after another few hours of googling http://www.bravegnu.org/gnu-eprog/index.html. Combined with answers here it helped me a lot.

Generally that is exactly how it works. You have to properly setup the hardware and/or the hardware may already have things hardcoded at fixed addresses.
You could ask the same question, how does the hardware know that when I write a byte to address 0x21000010 (I just made that up) that that is the uart transmit holding register and that write means I want to send a byte out the uart? The answer because it is hardcoded in the logic that way. Or the logic might have an offset, the uart might be able to move it might be at some other control register contents plus 0x10. change that control register (which itself has some hardcoded address) from 0x21000000, to 0x90000000 and then write to 0x90000010 and another byte goes out the uart.
I would have to look at that particular part, but if it does support external memory, then in theory that is all you have to do know what addresses in the processors address space are mapped to that external memory and reads and writes will cause external memory accesses.
Intel based computers, PC's, tend to like one big flat address space, use the lspci command on your Linux box (if you have one) or some other command if windows or a mac, and you will find that your video card has been given a chunk of address space. If you get through the protection of the cpu/operating system and were to write to an address in that space it will go right out the processor through the pcie controllers and into the video card, either causing havoc or maybe just changing the color of a pixel. You have already dealt with this with your avr and msp430s. Some addresses in the address space are flash, and some are ram, there is some logic outside the cpu core that looks at the cpu cores address bus and makes decisions on where to send that access. So far that flash bank and ram bank and logic are all self contained within the boundaries of the chip, this is not too far of a stretch beyond that the logic responds to an address, and from that creates an external memory cycle, when it is done or the result comes back on a read it completes the internal memory cycle and you go on to the next thing.
Does that make any sense or am I making it worse?

You can use the reserved word register to suggest to the compiler that it put that variable into an internal memory location:
register int iInside;
Use caution; the compiler knows how many bytes of register storage are available, and when all available space is gone it won't matter.
Use register variables only for things that are going to be used very, very frequently, such as counters.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight