MicroController execute code from external SRAM

MicroController execute code from external SRAM - arm

I've done some researches on ARM. Executing instructions that been loaded into external SRAM is slow.
I'm wondering if there are any microcontrollers that execute code from external SRAM as fast as from internal SRAM?
And I'm expecting to put a 1G external SRAM. Does microcontroller support that many memory?
Thanks.

SRAM [1], Static RAM is an expensive technology compared to DRAM, Dynamic random-access memory so most of the time you get a small amount of SRAM and relatively much bigger DRAM in computing devices. SRAM is generally used as small separated storage directly coupled to CPUs or to implement caches. In case of MCUs manufacturers tries to give you different approaches to utilize such layered hardware.

Related

How ARM CPU loading bootloader?

As I know, CPU can access RAM directly. Device RAM is empty on start and CPU don't know from where to load bootloader into the RAM for executing it. Even it can do nothing because call stack should be empty too as I think.
Yet how is bootloader program copied into the RAM for further execution?
This should happening with embedded devices such as smartphones. On a x86 PCs BIOS is responsible for loading MBR section from disk to RAM as I know.

A bootloader in RAM is a secondary bootloader; invariably there is code in a ROM of some kind containing a primary bootstrap that loads the secondary bootstrap. Often that ROM is mask-ROM on the chip it self.
Typically on an ARM application processor such as a Cortex-A the primary bootstrap will load code to RAM from NAND flash or SD card. ARM Cortex-M often run code directly from ROM in any case.

you have the same problem with an x86. and the same solution in general (as with any other processor as well), you put a rom/flash in the address space where the processor boots and/or where its vector table is.
there are some other solutions like having other logic that reads from some non volatile storage and places it in the boot/vector space, or other logic that provides an interface for some other processor/computer to download into the board/chip and then release reset.

Using LZ4HC algorithm in an ARM Cortex-M3 processor

I have this embedded system with a flash memory placed on the board to store a huge number of data. The main controller is an ARM Cortex-M3 processor and I'm supposed to compress the data placed on a part of flash and put the compressed data on another part of the flash.
Now since the amount of SRAM is limited in these kind of systems how exactly can I use the LZ4HC algorithm? I can't compress the whole data at once like we do in PC and I guess I have to do this on a little chunk of data or block by block (for example every 512 or 4096 bytes of data). I'm just not sure how. I couldn't totally understand the functions.
Is that even possible to do this block by block?
I couldn't find any example. And the open source code does not come with a good documentation. Actually I think there is no documentation.

I'd recommend you use a library specific for embedded systems. These libraries usually use low amount of memory and are designed to compress small chunks of data per cycle.
If you mustn't use LZ4HC or if you want to implement your own library, a good start point is heatshrink, a LZSS based library for embedded systems.
There is also a LZ4 decompression implementation in assembly for ARM processors here.

is there any MCU based on cortex-A who's on-chip mask primary bootloader can be changed?

I want study cortex-A inside. but AM335X and S5PV210's inside flash can not be changed, so I want to know if there is there any MCU based on cortex-A who's on-chip mask primary bootloader can be changed?
please recommend some for me, if there has.
please forgive my pool English, thank you!

There is usually no flash in a Cortex-A. The ROM code is usually, well.., in a Read Only Memory. When there is a bug in this code, you need to produce a new wafer mask to fix it, but, as millions of parts are produced, the cost reduction is significant, and ROM avoid data retention issues.

Flash memory is by definition re-writable; the parts you mentioned simply have no on-chip flash.
Parts that have on-chip flash typically execute code directly from it, so because of the relatively low speed of flash memory it is usually used on lower frequency processors sub-200KHz.
Fast "applications" processors do not normally have on-chip flash because it takes up a large amount of die space and and has insufficient capacity to support the for the kind of applications and OS (such as Linux, Android or Windows) typically used on such processors. Instead they often have an on-chip mask ROM primary bootloader than loads a secondary bootloader from external media such as NOR flash, NAND flash, SD card, eMMC etc. The secondary bootloader then boots the OS and/or application code.
Code on such processors is loaded to and executed from SDRAM which is much faster than flash. Also the boot media is not always memory mapped so cannot be executed directly in any case.

Memory space of ARM microprocessors

In ARM microprocessors, is the only available memory space the 37 or so general and status registers, or is there a separate accessible memory space within the microprocessor chip?
For example, in the Atmel AVR microcontroller, to my understanding, the memory is mapped internally within the same chip, with data memory, program memory (containing program memory) and EEPROM memory. Does the same apply to ARM microprocessors, or does a microcontroller with an ARM microprocessor require separate external memory?

Your interpretation of the Atmel AVR architecture is not quite correct.
Of course it's possible to integrate memory of virtually any kind on the same die as the CPU core. However, that doesn't mean you can compare flash memory available on one such integrated system to registers on another.
A CPU core needs a memory interface and that's all that counts: Flash is slower than registers. So if you connect Flash to an ARM processor it will behave similar (in the same order of magniture regarding speed) as the on-board Flash of the AVR.
Besides, ARM is solely an IP (design concept) and licenced by numerous companies which build efficient peripherals and sometimes also memory around the core. So you will find chips with an ARM core and on-board memory on the market.
(I simplified things a bit in the above description but I was focusing on trying to point out where I think you misunderstand how the two processors compare.)

Below link talks a lot about how memory management is done in ARM processor. Hope it helps
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471c/CHDDJIFI.html

Can I run a program from a SD memory instead of flash on an evaluation board (embedded programming)?

I have an evaluation board (Olimex STM32-P103) which supports a SD-card connector. I want to put my program in to a SD memory instead of internal flash of the micro-controler; and run it from there.
I don't know if it is possible to do that according to boot-loader issue!
P.S my goal is running linux on this board and then port my application over it.

To run programs from SD-Card in general you should know that you can't run them "right away". This means, you have to load it in a executable memory somewhere in your address space which is done by a (more or less) simple bootloader. In the simplest instance, the bootloader is capable to read from a SD-Card a specific binary and copy it into the memory.
That being said you should think about this considering you only got 20k of RAM and 128k of Flash on your board. So where should your program go? Or better: Why not flashing the program in the 128k of Flash from the very beginning? Especially you should know that Linux is a bit "hungry" in terms of memory.
If your goal is to run a "normal" Linux on this board, I'm afraid you're screwed. This because from what I know Linux needs a MMU to run and the chip on this board does not provide one (as far as researchable without access to datasheets from ST).
If you're lucky you can go with uCLinux. I'm not sure if a finished port exists for the STM32 but it seems there are some resources based on a short google search for "STM32 uCLinux". But even if you manage to run uCLinux I'm afraid there's not much left in your system for your application, so the result might be a bit disappointing.
Depending on why you are looking for Linux running on this MCU, there are maybe other solutions like a FreeRTOS in combination with a lwIP-stack (if networking is needed) or a FAT library like FullFAT if you are looking for reading SD-Cards and stuff.
Edit: One thing i'd like to add is that booting from the SD-Card is typically something you do with "bigger" (not much but slightly) systems where you have enough RAM to keep the whole image you'd like to run in it and still have some space left for the data you want to process.

You're going to have to have some code in the STM's onboard flash (typically called a "boot loader") that implements this since the "bare metal" very likely can't boot from SD card.
You're going to have to build that code, which figures out how to use the STM's onboard peripherals to talk to the SD card, finds the file you want to run in the file system (which you also have to implement), and loads it.
I wanted to include a link to the STM standard peripheral library, but it seems to be down (being moved). :/

The data on the SD card is not memory mapped, so cannot be executed directly.
It is possible to dynamically load the data from the card into RAM for execution. WindRiver's VxWorks RTOS supports loading and linking object modules dynamically, I know of no other OS that would scale to a Cortex-M that directly supports that but it would be possible to write your own.
However, I would suggest that in the case of the microcontroller you are using the idea is ill-advised; optimal performance on Cortex-M is achieved when the code is in on-chip flash and data in RAM allowing the data and instruction to be fetch to occur simultaneously on the separate buses (Harvard architecture). If you execute the code from RAM the performance will be severely hit since then data and instructions must be fetched sequentially over the same bus.
The board is entirely unsuited to running Linux, with only 128K Bytes Program Flash, and 20K Bytes RAM is is not at all feasible. Even the smallest Linux distribution requires 600Kb RAM plus whatever is needed by application code. uClinux can just about run on higher-end STM32 with external RAM and Flash, but that would suffer from the same bus contention performance hit and Linux without an MMU is rather missing the one major benefit of using Linux at all. The part on your board lacks an external memory interface, so cannot be expanded to support Linux.
If you need an OS consider a RTOS such as uC/OS-II, FreeRTOS, or emBOS for example.

AS other says you cannot directly execute your code directly from the SD CARD.
But like those "linux board", you can load the stored kernel/programm into an external SDRAM that can be mapped and execute it from there.
You'll still need to write that "bootloader" and store it in the internal flash.
That'is a lot work to my opinion, for limited application.
If you want to write your application in a linux environnement then port it suck small target, I would rather design my application using dependency injection, or even use an emulator.