I am working on an embedded system, so memory is precious for me.
One issue that has been recurring is that I've been running out of memory space when attempting to compile a program for it. This is usually fixed by limiting the number of typedefs, etc that can take up a lot of space.
There is a macro generator that I use to create a file with a lot of #define's in it.
Some of these are simple values, others are boundary checks
ie
#define SIGNAL1 (float)0.03f
#define SIGNAL1_ISVALID(value) ((value >= 0.0f) && (value <= 10.0f))
Now, I don't use all of these defines. I use some, but not actually the majority.
I have been told that they don't actually take up any memory if they are not used, but I was unsure on this point. I'm hoping that by cutting out the unused ones that I can free up some extra memory (but again, I was told this is pointless).
Do unused #define's take up any memory space?
No, #defines take up no space unless they are used - #defines work like find/replace; whenever the compiler sees the left half, it'll replace it with the right half before it actually compiles.
So, if you have:
float f = SIGNAL1;
The compiler will literally interpret the statement:
float f = (float)0.03f;
It will never see the SIGNAL1, it won't show up in a debugger, etc.
This is usually fixed by limiting the number of typedefs, etc that can take up a lot of space.
You seem somewhat confused because typedef's do not take up space at runtime. They are merely aliases for data types. Now you may have instances of large structures (typedef'd or otherwise), but it is the instance that takes space, not the type definition. I wonder what 'etc' might cover in this statement.
Macro instances are replaced in the source code with their definition, and code generated accordingly, an unused macro does not result in any generated code.
Things that take up space are:
Executable code (functions/member functions)
Data instantiation (including C++ object instances)
The amount of space allocated to the stack (or stacks in a multi-threaded system).
What is left is typically available for dynamic memory allocation (RAM), or is unused or made for non-volatile storage (Flash/EPROM).
Reducing memory usage is primarily a case of selecting/designing efficient data structures, using appropriate data types, and efficient code and algorithm design. It is best to target the area that will get the greatest benefit. To see the size of objects and code in your application, get the linker to generate a map file. That will tell you which are the largest functions, as well as the sizes of global and static objects.
Source file text length is not a good guide to code size. Large amounts of C code is declarative (typically header files are all declarative), and do not generate memory occupying code or data.
An embedded system does not necessarily imply small memory, so you should specify. I have worked on systems with 64Mb RAM and 2Mb Flash, and even that is modest compared with many systems. A typical micro-controller with on-chip resources however, will generally have much less (especially SRAM which takes up a lot of chip area). Also whether your system is Harvard or Von Neumann architecture is relevant here, since in a Harvard architecture data and code spaces are separate, so we need to know what it is you are short of. If Von Neumann, the code/data usage is still relevant if the code is running from ROM, or is it is copied from ROM to RAM at run-time (i.e. different types of memory, even if they are in the same address space).
Clifford
Well, yes and no.
No, unused #defines won't increase the size of the resulting binary.
Yes, all #defines (whether used or unused) must be known by the compiler when building the binary.
By your question it's a bit ambiguous how you use the compiler, but it almost seems that you try to build directly on an embedded device; have you tried a cross-compiler? :)
unused #defines doesn't take up space in the resulting executable. They do take up memory in the compiler itself whist compiling.
Related
In our GCC-based C embedded system we are using the -ffunction-sections and -fdata-sections options to allow the linker, when linking the final executable, to remove unused (unreferenced) sections. This works well since years.
In the same system most of the data-structures and buffers are allocated statically (often as static-variables at file-scope).
Of course we have bugs, sometimes nasty ones, where we would like to quickly exclude the possibility of buffer-overflows.
One idea we have is to place canaries in between each bss-section and data-section - each one presenting exactly one symbol (because of -fdata-sections). Like the compiler is doing for functions-stacks when Stack-Smashing and StackProtection is activated. Checking these canaries could be done from the host by reading the canary-addresses "from time to time".
It seems that modifying the linker-script (placing manually the section and adding a canary-word in between) seems feasible, but does it make sense?
Is there a project or an article in the wild? Using my keywords I couldn't find anything.
Canaries are mostly useful for the stack, since it expands and collapses beyond the programmer's direct control. The things you have on data/bss do not behave like that. Either they are static variables, or in case they are buffers, they should keep within their fixed size, which should be checked with defensive programming in-place with the algorithm, rather than unorthodox tricks.
Also, stack canaries are used specifically in RAM-based, PC-like systems that don't know any better way. In embedded systems, they aren't very meaningful. Some useful things you can do instead:
Memory map the stack so that it grows into a memory area where writes will yield a hardware exception. Like for example, if your MCU has the ability to separate executable memory from data memory and yield exceptions if you try to execute code in the data area, or write to the executable area.
Ensure that everything in your program dealing with buffers perform their error checks and not write out-of-bounds. Static analysis tools are usually decent at spotting out-of-bounds bugs. Even some compilers can do this.
Add lots of defensive programming with static asserts. Check sizes of structs, buffers etc at compile-time, it's free.
Run-time defensive programming. For example if(x==good) {...} else if(x == bad) {... } is missing an else. And switch(x) case A: { ... } is missing a default. "But it can't go there in theory!" No but in practice, when you get runaway code caused by bugs (very likely), data retention of flash (100% likely) or EMI influence on RAM (quite unlikely).
And so on.
I've been working with a program and I've been trying to conserve bytes and storage space.
I have many variables in my C program, but I wondered if I could reduce the program's size by making some of the variables that don't change throughout the program const or final.
So my questions are these:
Is there any byte save when identifying static variables as constant?
If bytes are saved by doing this, why are they saved? How does the program store the variable differently if it is constant, and why does this way need less storage space?
If bytes are not saved by defining variables as constant, then why would a developer define a variable this way in the first place? Could we not just leave out the const just in case we need to change the variable later (especially if there is no downfall in doing so)?
Are there only some IDEs/Languages that save bytes with constant variables?
Thanks for any help, it is greatly appreciated.
I presume you're working on deeply embedded system (like cortex-M processors).
For these, you know that SRAM is a scarce resource whereas you have plenty of FLASH memory.
Then as much as you can, use the const keyword for any variable that doesn't change. Doing this will tell compiler to store the variable in FLASH memory and not in SRAM.
For example, to store a text on your system you can do this:
const char* const txtMenuRoot[] = { "Hello, this is the root menu", "Another text" };
Then not only the text is stored in FLASH, but also its pointer.
All your questions depend heavily on compiler and environment. A C compiler intended for embedded environment can do a great job about saving memory, while others maybe not.
Is there any byte save when identifying static variables as constant?
Yes, it may be possible. But note that "const", generally, isn't intended to specify how to store a variable - instead its meaning is to help the programmer and the compiler to better understand the source code (when the compiler "understand better", it can produce better object code). Some compiler can use that information to also store the variable in read-only memory, or delete it and turn it into literals in object code. But in the context of your question, may be that a #define is more suitable.
If bytes are saved by doing this, why are they saved? How does the program store the variable differently if it is constant, and why does this way need less storage space?
Variables declared in source code can go to different places in the object code, and different places when an object file is loaded in memory and executed. Note that, again, there are differences on various architectures - for example in a small 8/16 bits MCU (cpu for electronic devices), generally there is no "loading" of an object file. So the value of a variable is stored somewhere - anyway. But at low level the compiler can use literals instead of addresses, and this mostly saves some memory. Suppose you declare a constant variable GAIN=5 in source code. When that variable is used in some formula, the compiler emits something like "LD R12,GAIN" (loads register R12 with the content of the address GAIN, where variable GAIN is stored). But the compiler can also emit "LD R12,#5" (loads the value "5" in R12). In both cases an instruction is needed, but in the second case there is no memory for variables involved. This is a saving, and can also be faster.
If bytes are not saved by defining variables as constant, then why would a developer define a variable this way in the first place? Could we not just leave out the const just in case we need to change the variable later (especially if there is no downfall in doing so)?
As told earlier, the "const" keyword is meant to better define what operations will be done on the variable. This is useful for programmers, for clarity. It is useful to clearly state that a variable is not intended to be modified, especially when the variable is a formal parameter. In some environments, there is actually some read-only memory that can only be read and not written to and, if a variable (maybe a "system variable") is marked as "const", all is clear to the programmer -and- the compiler, which can warn if it encounters code trying to modify that variable.
Are there only some IDEs/Languages that save bytes with constant variables?
Definitely yes. But don't talk about IDEs: they are only editors. And about languages, things are complicated: it depends entirely on implementation and optimization. Likely this kind of saving is used only in compilers (not interpreters), and depends a lot on optimization options/capabilities of the compiler.
Think of const this way (there is no such thing as final or constant in C, so I'll just ignore that). If it's possible for the compiler to save memory, it will (especially when you compile optimizing for size). const gives the compiler more information about the properties of an object. The compiler can make smarter decisions when it has more information and it doesn't prevent the compiler from making the exact same decision as before it had that information.
It can't hurt and may help and it also helps the programmers working with the code to easier reason about it. Both the compiler and the programmer are helped, no one gets hurt. It's a win-win.
Compilers can reduce the memory used based on the knowledge of the code, const help compiler to know the real code behaviour (if you activate warnings you can have suggestions of where to put const).
But a struct can contains unused byte due to alignment restrictions of the hw used and compilers cannot alter the inner order of a struct. This can be done only changing the code.
struct wide struct compact
{ {
int_least32_t i1; int_least32_t i1,
int_least8_t b; i2;
int_least32_t i2; int_least8_t b;
} }
Due to the alignment restrictions the struct wide can have an empty space between members 'b' and 'i2'.
This is not the case in struct compact because the elements are listed from the widest, which can require greater alignments, to the smaller.
In same cases the struct compact leads even to faster code.
Is there any way to check or prevent stack area from crossing the RAM data (.data or .bss) area in the limited memory (RAM/ROM) embedded systems comprising microcontrollers? There are tools to do that, but they come with very costly license fees like C-STAT and C-RUN in IAR.
You need no external tools to view and re-map your memory layout. The compiler/linker you are using should provide means of doing so. How to do this is of course very system-specific.
What you do is to open up the system-specific linker file in which all memory segments have been pre-defined to a default for the given microcontroller. You should have the various RAM segments listed there, de facto standard names are: .stack .data .bss and .heap.
Each such segment will have an address range specified. Change the addresses and you will move the segments. However, these linker files usually have some obscure syntax that you need to study before you touch anything. If you are (un)lucky it uses GNU linker scripts, which is a well-documented, though rather complex standard.
There could also be some manufacturer-supplied start-up code that sets the stack pointer. You might have to modify that code manually, in addition to tweaking the linker file.
Regarding the stack: you need to check the CPU core manual and see if the stack pointer moves upwards or downwards on your given system. Most common is downwards, but the alternative exists. You should ensure that in the direction that the stack grows, there is no other read/write data segment which it can overwrite upon stack overflow. Ideally the stack should overflow into non-mapped memory where access would cause a CPU hardware interrupt/exception.
Here is an article describing how to do this.
In small micros that do not have the necessary hardware support for this, a very simple method is to have a periodic task (either under a multitasker or via a regular timed interrupt) check the 'threshold' RAM address which you must have initialized to some 'magic' pattern, like 0xAA55
Once the periodic task sees this memory address change contents, you have a problem!
In microcontrollers with limited resources, it is always a good idea to prevent stack overflow via simple memory usage optimizations:
Reduce overall RAM usage by storing read-only variables in non-volatile (e.g. flash) memory. A good target for this are constant strings in your code, like the ones used on printf() format strings, for example. This can free a lot of memory for your stack to grow. Check you compiler documentation about how to allocate these variables in flash.
Avoid recursive calls - they are not a good idea in resource-constrained or safety-critical systems, as you have little control over how the stack grows.
Avoid passing large parameters by value in function calls - pass them as const references whenever possible (e.g. for structs or classes).
Minimize unnecessary usage of local variables. Look particularly for the large ones, like local buffers for example. Often you can find ways to just remove them, or to use a shared resource instead without compromising your code.
I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().
The limited stack size of budget PICs is a problem area and I have adjusted my code to accommodate this reality. I currently adopt a rough paradigm of grouping closely related functions into a module and declaring all variables global static in the module (to reduce the amount of variables stored in the auto psect, and issues of mutability are only relevant in ISRs, which I account for.) I don't do this because it is good practice, but the reality is you have a finite amount of space to allocate all local function vars that exist in an entire project. In the embedded world of 8/16 bit chips, is this an appropriate method, provided I'm sure to take necessary precautions? I also do things like allocate > 256 bytes of RAM for Ethernet (I know it should be 1500 as standard MTU, but we have a custom situation and very limited RAM) buffers and have to access that memory via pointers so I can avoid the semantics of memory banking. Am I doing it wrong? My app works, but I am 100% open to suggestions for improvement. [c]
I know this was asked 4 years ago but it still has not been properly answered. I believe what the OP is asking is is their approach to working around a limitation of the HiTech PICC18 C compiler valid and/or best practice. As mentioned in a later comment the limitation (a rather bad one and not well advertised by Hitech) is "the Hi-Tech compiler only allows up 256 bytes of auto variables". Actually the limitation is worse than that as it is a total of 256 bytes for local variables and parameters. The linker warning when this is exceeded is pretty cryptic too. Provided that functions are on different branches of the call tree then the compiler can overlap the variables to reuse the space. This means that you can effectively have more than 256 bytes. But note that the interrupt handler (or handlers if you use the priority scheme) has it's own call tree that shares the 256 byte local/param block.
Locals
The two solutions to reduce the space required for locals are: make the locals global or make them static. Making them static keeps the scope the same and provided the function is not called from interrupts is safe (rentrancy is not allowed by the compiler anyway). This is probably the preferred option. The drawback is that the compiler can not reuse those variable's locations to reduce overall memory consumption. Moving the variables to global scope allows reuse, but the reuse management must be managed by the programmer. Probably the best balance is to make simple variables static but to make large chunks of memory like string buffers global and carefully reuse them.
Be careful with initialisation.
foo()
{
int myvar = 5;
}
must change to
foo()
{
static int myvar;
myvar = 5;
}
Parameters
If you go around passing large lots of data down the call tree in parameters you will quickly run into the same 256 byte limitation. Your best option here may be to pass a pointer to a globally allocated struct/s of "options".Alternatively you can have global settings variables that are set by the top caller and read by callees down the tree. It really depends on the design of the software which approach is better.
I've struggled with the same issues as the OP and I think the best option in the long run is to move away from using the Hitech compiler. The optimisation decision the compiler writers took to allocate all locals/params in one block is only really appropriate for the very small ram size PICS. For large PICS you will run out of local/param far before you hit the ram size of the device. Then you have to start hacking your code around to fit the compiler which is perverse.
In summary... Yes your approach is valid. But do consider simply making locals static if that is appropriate as, in general, reducing the scope makes your code safer.
Whereas the C18 compiler used some FSRs (pointers) to manage the data stack, it sounds like the new XC8 compiler from Microchip uses a compiled stack, so you should know exactly how much space is taken up by the stack at compile time. You will also know exactly where each stack variable is stored. I read all about this in the XC8 user's guide and it sounds great. That feature should make this question be moot, assuming you are using XC8.
My experience with compilers/linkers for chips with limited memory is that, as long as you don't use recursive functions and inform the compiler about that, then the compiler is very capable of determining the minimal amount of stack-space that is needed.
I have even seen compilers that give each variable with automatic storage a globally fixed address (no stack at all), where several variables got allocated to overlapping memory, as long as their lifetimes did not overlap.
The general advise when doing (speed or space) optimisations is: make measurements to prove that your optimisation actually has a positive effect.
Since you are nearly out of memory, you have to count each byte of RAM. Using local variables (auto) allows to reuse the memory where you need it (local in the function). When you move the variables to global static address space, you give each variable a unique space. That's wast of address space.
The Microchip compiler allows that different variables share the same address. I don't have the docs at hand, but this can be done by pragma.
But what you need is a analysis of RAM requirements. When you see, that the stack cannot hold all variables but the auto variables would reduce the global memory use, you should consider to increase the stack size using startup code and the linker script.
Best practive is to choose a hardware that fits the requirements.
There are microcontrollers around the cost only some dollars more, but save hundereds or thousand of dollars development costs. If this is a hobby development your effort may not count. But in real world you can often find hardware that is designed only with view of hardware costs.
Especially the PIC18 is not the best example for compact code, what also can be a problem with the flash memory.
This migth sound obvious, but try not to use 16 bits variables on 8 bit precessors. 16 bits variables are fine and needed on bigger arquitectures, but in limited (8 bit) architectures a 16 bit aritmetic is a quick way for depleting both RAM and ROM memories in no time.
If you try to increment a 16 bits variable, the compiler would include a 16 bits increment library, that consumes in most cases a lot of space.
Also, try not to divide or multiply, as for some controllers they are software implemented.
Personally, I go alwais for char and when in need of a divide operation, use rotate rigth 'n' times to divide by 2 n times.
hope this helps!
A bit late, but you should also have a closer look at the C18 compiler user guide (if you were using this compiler).
You could decrease the stack dramatically by statically allocating local variables (overriding the auto keyword). Even better, you can use the overlay storage identifier, which allows different non-overlapping lifetimes variables to be placed at the same address, minimizing RAM. (C18 compiler must operate in Non-Extended mode).