I defined a auto char array as following :
char buffer[100];
When i compile it,the compiler return following error :
error: (1250) could not find space (100 bytes) for variable _buffer
But when i change it to:
static char buffer[100];
The program is compiled successfully.
Note 1:
My target device is 16f1829.
Note 2 :
Compiler version is 1.30.
All PIC16's have RAM banks which are 80 bytes of usable RAM per bank. This is described in section 3.2.4 Common RAM in the datasheet.
The problem you are seeing is not related to the size of the stack but rather the size of each item which can be allocated on the stack.
On XC8 auto variables cannot individually be larger than one bank of ram, which means the largest auto variable possible will be 80 bytes.
This is described in detail in the XC8 Compilers Users Guide in section 5.5.2.2.3 as follows :
Unlike with non-auto variables, it is not efficient to access auto variables within the compiled stack using the linear memory of enhanced mid-range devices. For all devices, including PIC18 and Enhanced mid-range PIC MCUs, each component of the compiled stack must fit entirely within one bank of data memory on the target device (however, you can have more than one component, each allocated to a different bank). This limits the size of objects within the stack to the maximum free space of the bank in which it is allocated. The more auto variables in the stack; the more restrictive the space is to large objects. Recall that SFRs on mid-range devices are usually present in each data bank, so the maximum amount of GPR available in each bank is typically less than the bank size for these devices.
Yes and it's not really a stack because xc8 functions are non-reentrant.
That means rather than use an actual stack to hold auto variables and parameters, it uses pre-allocated space in RAM.
Look at the call tree maps. You may have duplicated calls because if they can be called from both an interrupt and normal runtime they will be in different call trees and will therefore require separate allocations of memory (i.e. they will take up DOUBLE the space; potentially TRIPLE if you have three call tress)
Related
Let's say I declare a global array in my C program (i.e. it's not allocated in stack memory).
What is the largest array I can create? Is it just a limitation of available memory on the machine? Or is there some other OS setting that typically controls this?
There is no single largest bound on an array size in C. In any particular implementation, it may be limited by actually available memory (either main memory or swap space on disk, according to operating system features and settings), policy set by the system operator, capability of the address space, limitations of the compiler, and possibly other factors.
The limit may be available memory in one system and address space in another system.
I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?
The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).
I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.
Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);
Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS
In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.
When setting up a fixed size array in a kernel, such as:
int my_array[100];
In which memory space does the array end up?
In particular, I would like to find out if such an array may be stored in the register file or shared memory on >= 2.0 devices and, if so, what the requirements are.
For Fermi (and probably earlier architectures), for an an array to be stored in the register file, the following conditions must be met:
The array is only indexed with constants
There are registers available
Hopefully, the compiler also does some analysis to determine impact on overall performance
The reason for (1) is that register indexes are encoded directly within the SASS instructions. There is no way to address registers indirectly.
The main factors that limits the number of registers for (2) are:
The SASS instructions contain only 6 bits for register indexing, which limits the number of registers that can be used in a kernel to 64. The actual number is 63 so one is reserved for something.
An SM has a block of registers that are shared by all threads that are concurrently in flight.
Registers are also needed for holding variables, so the compiler must balance register usage for best overall performance.
A potential workaround for (1) is loop unrolling. If a loop uses a loop counter as an index into an array, unrolling the loop (with #pragma unroll or manually) causes the array indexes to become constants as there is now a separate SASS instruction for each array access.
Based in part on this NVIDIA presentation: Local Memory and Register Spilling. The document also goes into detail about how the locations of variables and arrays affect performance.
Local arrays within a kernel, as the one you have defined are allocated in registers and in local memory when there are not enough register.
If you want allocate the array in shared memory you must specify it as follow:
__shared__ int my_array[100];
I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().
I am new at embedded system programming. I am working on a device that uses an 8051 chipset. I have noticed in the sample programs that when defining variables, sometimes they use the keyword xdata. like this...
static unsigned char xdata PatternSize;
while other times the xdata keyword is omitted.
My understanding is that the xdata keyword instructs the compiler that that variable is to be stored in external, flash, memory.
In what cases should I store variables externally with xdata? Accessing those variables takes longer, right? Values stored using xdata do not remain after a hard reset of the device do they?
Also, I understand that the static keyword means that the variable will persist through every call to the function it is defined in. Do static and xdata have to be used together?
The 8051 architecture has three separate address spaces, the core RAM uses an 8 bit address, so can be up to 256 bytes, XDATA is a 16bit address space (64Kbytes) with read/write capability, and the program space is a 16bit address space with execution and read-only data capability. Because of its small address range and close coupling to the core, addressing the core RAM is more efficient in terms of code space and access cycles
The original 8051 core had tiny-on-chip RAM (an address space of 256 bytes but some variants had half that in actual memory), and XDATA referred to off-chip data memory (as opposed to program memory). However most modern 8051 architecture devices have on-chip XDATA and program memory.
So you might use the core memory when performance is critical and XDATA for larger memory objects. However the compiler should in most cases make this decision for you (check your compilr's manual, it will describe in detail how memory is allocated). The instruction set makes it efficient to implement the stack in core memory, whereas static and dynamically allocated data would usually be more sensibly allocated in XDATA. If the compiler has an XDATA keyword, then it will override the compiler's strategy, and should only be used when the compiler's strategy somehow fails since it will reduce the portability of the code.
[edit] Note also that the core memory includes a 32byte bit-addressable region, the bit-addressing instructions use an 8bit address into this region to access individual bits directly. The region exists within the 256byte byte addressable core memory, so is both bit and byte addressable[/edit]
xdata tells the compiler that the data is stored in external RAM so it has to use a different instruction to read and write that memory instead of internal RAM.
Accessing external data does take longer. I usually put interrupt variables in internal RAM and most large arrays in external RAM.
As to the state of the external RAM after a hard reset (not power cycle): That would depend on the hardware setup. Does a reset line go to the external chip? Also some chips come with XDATA within the CPU chip. Read that again. Some chips have an 8051 CPU plus some amount of XDATA within the IC.
static and xdata do not overlap. Static tells the compiler how to allocate a variable (on a stack or at a memory location). Xdata tells the compiler how to get to that variable. Static can also restrict the name space of that variable to just that file. You can have an xdata static variable that is local to just a function, and have a static variable that is local to a function but uses internal RAM.
An important point not yet mentioned is that because different instructions are used to access different memory areas, the hardware has no unified concept of a "pointer". Any address which is known to be in DATA/IDATA space may be uniquely identified with a one-byte pointer; likewise any address which is known to be in PDATA space. Any address which is known to be in CODE space may be identified with a two-byte pointer; likewise any address which is known to be in XDATA space. In many cases, though, a routine like memcpy won't know in advance which memory space should be used with the passed-in pointers. To accommodate that, 8x51 compilers generally use a three-byte pointer type which may be used to access things in any memory space (one byte selects which type of instructions should be used with the pointer, and the other bytes hold the value). A pointer declaration like:
char *ptr;
will define a three-byte pointer which can point to any memory space. Changing the declaration to
char xdata *data ptr;
will define a two-byte pointer which is stored in DATA space, but which can only point to things in the XDATA space. Likewise
char data * data ptr;
will define a two-byte pointer which is stored in DATA space, but which can only point to things in the DATA and IDATA spaces. Code which uses pointers that point to a known data space will be much faster (possibly by a factor of ten) than code which uses the "general-purpose" three-byte pointers.
How and when to use xData memory area depends on the system architecture. Some systems may have RAM at this address while others could have ROM or Flash. In either case, access will be slower than accessing internal RAM, ROM or Flash.
Generally speaking, large items, constant items and lesser used items should go into xData. There are no standard rules as to what goes in xData, as it depends on the architecture.
The 8051 has a 128 byte range of scratch pad "pseudo-registers" that (most) compilers use as the default for declared variables. But obviously this area is very small, and you want to be able to put variables in the 16 bit memory address space too. That's what the xdata (i.e. "external data") specifier is for. What to put where depends, obviously, on what the data is and how you plan on using it.
Basically, I think this is the wrong question. You need to understand your CPU architecture first before learning how to use the C compiler's 8051-specific features.