External Memory(ies) Usage for Max Speed

External Memory(ies) Usage for Max Speed - arm

I haven't used external memory OR ARM core micros before; all of the micros I've used have had internal FLASH and separate data/program address spaces. So forgive me if these questions are very basic, but I could use a "sanity check" to make sure I'm not missing something important:
I have an existing program that when compiled for one micro has the following memory table (IAR for Cortex M3):
40 620 bytes of readonly code memory
1 215 bytes of readonly data memory
126 900 bytes of readwrite data memory
I am moving to a micro that has NO internal FLASH and 128kB of internal SRAM, as it has a very high processor speed that I need. My plan is to use external NOR FLASH (let's say 512kB for the sake of argument) and at least one DDR2 external RAM (again, assume 512kB+ for sake of argument).
I'd like to copy the contents of the external FLASH into internal SRAM at boot-up (a bootloader is provided in a separate internal FLASH space), and execute code out of SRAM. What I'm still not clear on is if the 128kB of internal SRAM is sufficient to allow for this. Can I simply use the DDR2 external RAM to house all "data", and execute the program code out of SRAM? Is there a speed compromise this way? Speed is my #1 priority in this application. Is there a way to do this that will result in quicker execution?
Thanks

Depending on how the arm is connected to those memories (flash, sram, dram) then you would be able to use them however you like. run what you can from sram have other stuff in dram. dram in general is much slower than sram, but you might have a cache to help and other factors that may make one or the other memory worse/better. if using gnu tools you can certainly craft a linker script and bootloader that takes the various segments .text, .data, .rodata, etc and sets them up wherever you want them (sram, dram, specific places in each, etc), and then let the arm go have at it...For IAR I dont know much about that but no doubt they have some mechanism as well for doing the same thing.

Related

ARM - Memory map leakages

Lets assume that we are using MCU with ARM Cortex-M4, 256KB of FLASH and 64KB of RAM. This CPU contains memory map like showed below:
As I understand it correctly, the memory map tells us what are the maximum sizes of memories, that limits MCU vendor and where that CPU will look for it. For example, we cannot use Cortex-M4 with FLASH memory above 512MB, right?
In that situation, we have 64KB of RAM, and the limit is 512MB. My question is - does CPU know about that? Does it have any safety mechanisms, that will avoid trying to access beyond that 64KB of RAM (stack overflow) by halting in any way? Or maybe the CPU will work in way like "I have that boundaries, I will move around these if necessary". I know, that compilers may provide some information, that can aware the programmer.

As I understand it correctly, the memory map tells us what are the maximum sizes of memories, that limits MCU vendor and where that CPU will look for it.
Yes.
For example, we cannot use Cortex-M4 with FLASH memory above 512MB, right?
Normally the flash is the part between address 0x0 and 0x1FFFFFFF. Meaning 512MB indeed (1024*1024*512=0x20000000). Which is a ridiculously large size for a Cortex M.
My question is - does CPU know about that?
Yes and no. The physical memory will exist where the vendor placed it. This can at some extent be remapped through the linker script.
The Cortex M does not have an advanced MMU/MPU with support virtual memory, meaning all memory is physical addresses. It does however keep track of various invalid accesses through hardware exceptions. From ARM/Keil AN209 Using Cortex-M3/M4/M7 Fault Exceptions:
Fault exception handlers
Fault exceptions trap illegal memory accesses and illegal program behavior. The following conditions are detected by fault exception handlers:
HardFault: is the default exception and can be triggered because of an error during exception processing, or because an exception cannot be managed by any other exception mechanism.
MemManage: detects memory access violations to regions that are defined in the Memory Management Unit (MPU); for example, code execution from a memory region with read/write access only.
BusFault: detects memory access errors on instruction fetch, data read/write, interrupt vector fetch, and register stacking (save/restore) on interrupt (entry/exit).
UsageFault: detects execution of undefined instructions, unaligned memory access for load/store multiple. When enabled, divide-by-zero and other unaligned memory accesses are detected.

No the CPU does not know - you specify the memory map in the linker script, and the link will fail if your code and/or data cannot be located in the stated available memory.
If you specify the memory map incorrectly, the linker may locate code/data in non-existent memory and when you load it, parts will be missing. For the flash programming very likely the programming tool will fail if it is set to read-back verify the code.
Also if you dynamically load code to non existent memory, or access memory not allocated by the linker at run-time, the results are non-deterministic, other than it won't do anything useful.

The CPU cannot know as everyone has said. The MCU vendor buys the processor ip from arm, as well as ip from other vendors as well as creates some of their own if nothing else the glue that holds the modules together. The flash itself is likely from some third party.
Some chip designers wrap around, this is not uncommon in hardware or software, for example the part may have 16Kbytes starting at 0x08000000 this is the CHIP companies decision ARM has little to do with it other than what you have found that they define wide ranges (likely for caching and other options within their domain). 16K is 16384 bytes or 0x4000 so 14 bits of address. There is likely an address decoder that sees some number of upper bits 0x08...and sends that request to the flash logic, then at the flash logic it would not suprise me to see the lower 14 address bits stripped off and used meaning if you were to address 0x08000000 and 0x08008000 you may get the same 0x0000 offset/address in the flash.
Some engineers may choose to look at those upper bits and declare a fault.
You have to examine this on a case by case basis not just an stm32 for example but each family of stm32, for every datasheet basically. (And there is no reason to expect this level of detail is documented by the chip vendor).
The arm cortex-m as with all processors are very very stupid they do what the bits tell them to do it is our responsibility to feed the a sequential trail of working instructions, just like laying track in front of a train you can lay a lot of track in the wrong place, with gaps, etc. If not per the rules of the train then the train will crash or fail in some way.
The others have mentioned the linker script, and to be clear the linker script does not just magically somehow know what chip you have, ultimately you, the programmer are responsible for telling the toolchain to build programs that follow the rules of the cpu AND CHIP, to be successful. So the right architecture instructions (or a subset, cortex-m0 instructions (armv6m will run on a cortex-m4 (armv7m)). And the linker script needs to define addresses for read only and read write areas that match the chip (not the core, the chip as they are in charge of that definition). And then barring 100 other ways you can fail. It will run.
You are ultimately responsible but most folks grab an sdk or sandbox of some sort and hope for the best, blind faith in others. Gnu and llvm tools are fully capable to be used by you directly without these third parties, but then you are fully responsible for getting everything right.

.text area of memory layout

My Question is related to the memory layout in embedded system
I learned that when we flash(or burn) a executable file it sits either in ROM or FLASH depending on the hardware we use.
But i also learned from the memory layout of a c-program that program segment contains the .text area ( i.e compiled code)
My question is :
1) Is it the same code what we burn in flash/ROM sits in RAM(as depicted in .text area of program code)
2) Two copies are created one in flash/ROM and RAM ??

1) Depending on your hardware, the .text (and other segments) may be accessible directly in flash/ROM, or (in the case of serial flash), it may need to be copied into RAM to be executable.
2) The version in flash/ROM is the only version, UNTIL execution starts. Then (depending on the answers in 1), some start up code MAY copy the ROM into RAM to execute it, OR it may execute directly from the flash/ROM. Once executing, the C start up code MAY copy (or initialise) some of the non-code segments into RAM (e.g. .data, .bss etc).
Older, slower processors, may execute from ROM (think 8086/6502 era), whereas more modern processors (Pentium+ era, FPGA etc) would run incredibly slowly if executing from flash, so they will copy the executable into RAM (and even then, the currently executing code will be cached into the processor itself, so there may be a 3rd copy of the code).

RAM usage AT32UC3B0512

I'm searching for a way to see the RAM usage of my application running on an at32uc3b0512.
arv32-size.exe foo.elf tells me:
text data bss dec hex filename
263498 11780 86524 361802 5854a foo.elf
According to 'google', RAM usage is .data + .bss. But .data + .bss is already (11780+86524)/1024 = 96kb, which would mean that my RAM is full (at32uc3b0512 -> 96kb SRAM). But the application works as desired. Am I wrong???

The chip you are using has 96kB of RAM and that is also the sum of your .bss and .data sections. This does not mean that all of your RAM is being used up, rather it is merely showing how the RAM is being allocated.

The program on MCU is usually located in FLASH
this is not true if you have some OS present
and load program to memory on runtime from somewhere like SD card
not all MCU's can do that
I suspect that is not your case
The program Flash is 512 KByte big (I guess from your IC's number)
The SDRAM is used for C engine/OS,stack and heap
your chip has 96 KByte
the C engine is something like OS handling
dynamic allocations,heap,stack,subroutine calls
and including RTL used during compilation
and of coarse dummy Interrupt sub routines for unused interrupts...
When you compile program to ELF/HEX what ever
the compiler/linker tells you only
how big the program code and data is (located in program FLASH memory)
how big static variables you have
the rest is unknown until runtime itself
So if you need to know how big chunk of memory you take
then you need to extract it from runtime
by some RTL call to get memory status
or by estimating it yourself based on knowledge of
what your program does
how much of dynamic memory is used
heap/stack trashing/usage
recursions level, etc...
Or you can try to increasingly allocate memory until you hit out of memory
and count how big chunk you allocated altogether
then release it of coarse
the used memory is then ~ 96KB - altogether_allocated_memory
(+/-) granularity ...

ARM bare-metal with MMU: write to non-cachable,non-bufferable mapped area fail

I am ARM Cortex A9 CPU with 2 cores. But I just use 1 core and the other is just in a busy loop. I setup the MMU table using section (1MB per entry) like this:
0x00000000-0x14ffffff => 0x00000000-0x14ffffff (non-cachable, non-bufferable)
0x15000000-0x24ffffff => 0x15000000-0x24ffffff (cachable, bufferable)
0x25000000-0x94ffffff => 0x25000000-0x94ffffff (non-cachable, non-bufferable)
0x15000000-0x24ffffff => 0x95000000-0xa4ffffff (non-cachable, non-bufferable)
0xa5000000-0xffffffff => 0xa5000000-0xffffffff (non-cachable, non-bufferable)
It is rather simple. I just want to have a mirror of 256MB memory for non-cachable access. However, when I do several write to the the non-cachable memory section at 0x95000000-0xa4ffffff. I find the write is not actually written until I explicitly give a cache flush.
Am I doing something wrong or this kind of mapping is not valid? If that is the case, I don't understand how Linux's ioremap will be working on ARM. It will be good if anyone can give some explanation to me. Thanks very much.

First of all: the Cortex-A9 is an ARMv7-A processor. The terms non-cacheable/non-bufferable/cacheable/bufferable are no longer correct descriptions of the mappings.
The actual mapping type is determined by TEX[2:0], C and B bits.
So I am actually having to guess a bit here as to what your mappings actually are.
And my guess is that you have the majority of your mappings set as Strongly-ordered, and the mirrored region as Normal Write-Back cacheable.
Having multiple virtual mappings with different memory types pointing to the same physical location is generally not a good idea in the ARM architecture. It used to be explicitly banned, but the latest version of the ARMv7-AR Architecture reference manual (DDI 0406C.b) has a (fairly long) section dedicated to the implications of "Mismatched memory attributes".
I would recommend finding a different way of achieving your goal.
Simply changing the mapping of the uncached regions to Normal Non-cacheable would be a good start. There is no valid reason for using Strongly-ordered mappings for RAM.

How to use external memory on a microcontroller

In the past, I've worked a lot with 8 bit AVR's and MSP430's where both the RAM and flash were stored on the chip directly. When you compile and download your program, it sort of "just works" and you don't need to worry about where and how variables are actually stored.
Now I'm starting a project where I'd like to be able to add some external memory to a microcontroller (a TI Stellaris LM3S9D92 if that matters) but I'm not entirely sure how you get your code to use the external RAM. I can see how you configure the external bus pretty much like any other peripheral but what confuses me is how the processor keeps track of when to talk to the external memory and when to talk to the internal one.
From what I can tell, the external RAM is mapped to the same address space as the internal SRAM (internal starts at 0x20000000 and external starts at 0x60000000). Does that mean if I wrote something like this:
int* x= 0x20000000;
int* y= 0x60000000;
Would x and y would point to the first 4 bytes (assuming 32 bit ints) of internal and external RAM respectively? If so, what if I did something like this:
int x[999999999999]; //some super big array that uses all the internal ram
int y[999999999999]; //this would have to be in external ram or it wouldn't fit
I imagine that I'd need to tell something about the boundaries of where each type of memory is or do I have it all wrong and the hardware figures it out on its own? Do linker scripts deal with this? I know they have something to do with memory mapping but I don't know what exactly. After reading about how to set up an ARM cross compiler I get the feeling that something like winavr (avr-gcc) was doing a lot of stuff like this for me behind the scenes so I wouldn't have to deal with it.
Sorry for rambling a bit but I'd really appreciate it if someone could tell me if I'm on the right track with this stuff.
Update
For any future readers I found this after another few hours of googling http://www.bravegnu.org/gnu-eprog/index.html. Combined with answers here it helped me a lot.

Generally that is exactly how it works. You have to properly setup the hardware and/or the hardware may already have things hardcoded at fixed addresses.
You could ask the same question, how does the hardware know that when I write a byte to address 0x21000010 (I just made that up) that that is the uart transmit holding register and that write means I want to send a byte out the uart? The answer because it is hardcoded in the logic that way. Or the logic might have an offset, the uart might be able to move it might be at some other control register contents plus 0x10. change that control register (which itself has some hardcoded address) from 0x21000000, to 0x90000000 and then write to 0x90000010 and another byte goes out the uart.
I would have to look at that particular part, but if it does support external memory, then in theory that is all you have to do know what addresses in the processors address space are mapped to that external memory and reads and writes will cause external memory accesses.
Intel based computers, PC's, tend to like one big flat address space, use the lspci command on your Linux box (if you have one) or some other command if windows or a mac, and you will find that your video card has been given a chunk of address space. If you get through the protection of the cpu/operating system and were to write to an address in that space it will go right out the processor through the pcie controllers and into the video card, either causing havoc or maybe just changing the color of a pixel. You have already dealt with this with your avr and msp430s. Some addresses in the address space are flash, and some are ram, there is some logic outside the cpu core that looks at the cpu cores address bus and makes decisions on where to send that access. So far that flash bank and ram bank and logic are all self contained within the boundaries of the chip, this is not too far of a stretch beyond that the logic responds to an address, and from that creates an external memory cycle, when it is done or the result comes back on a read it completes the internal memory cycle and you go on to the next thing.
Does that make any sense or am I making it worse?

You can use the reserved word register to suggest to the compiler that it put that variable into an internal memory location:
register int iInside;
Use caution; the compiler knows how many bytes of register storage are available, and when all available space is gone it won't matter.
Use register variables only for things that are going to be used very, very frequently, such as counters.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight