The limited stack size of budget PICs is a problem area and I have adjusted my code to accommodate this reality. I currently adopt a rough paradigm of grouping closely related functions into a module and declaring all variables global static in the module (to reduce the amount of variables stored in the auto psect, and issues of mutability are only relevant in ISRs, which I account for.) I don't do this because it is good practice, but the reality is you have a finite amount of space to allocate all local function vars that exist in an entire project. In the embedded world of 8/16 bit chips, is this an appropriate method, provided I'm sure to take necessary precautions? I also do things like allocate > 256 bytes of RAM for Ethernet (I know it should be 1500 as standard MTU, but we have a custom situation and very limited RAM) buffers and have to access that memory via pointers so I can avoid the semantics of memory banking. Am I doing it wrong? My app works, but I am 100% open to suggestions for improvement. [c]
I know this was asked 4 years ago but it still has not been properly answered. I believe what the OP is asking is is their approach to working around a limitation of the HiTech PICC18 C compiler valid and/or best practice. As mentioned in a later comment the limitation (a rather bad one and not well advertised by Hitech) is "the Hi-Tech compiler only allows up 256 bytes of auto variables". Actually the limitation is worse than that as it is a total of 256 bytes for local variables and parameters. The linker warning when this is exceeded is pretty cryptic too. Provided that functions are on different branches of the call tree then the compiler can overlap the variables to reuse the space. This means that you can effectively have more than 256 bytes. But note that the interrupt handler (or handlers if you use the priority scheme) has it's own call tree that shares the 256 byte local/param block.
Locals
The two solutions to reduce the space required for locals are: make the locals global or make them static. Making them static keeps the scope the same and provided the function is not called from interrupts is safe (rentrancy is not allowed by the compiler anyway). This is probably the preferred option. The drawback is that the compiler can not reuse those variable's locations to reduce overall memory consumption. Moving the variables to global scope allows reuse, but the reuse management must be managed by the programmer. Probably the best balance is to make simple variables static but to make large chunks of memory like string buffers global and carefully reuse them.
Be careful with initialisation.
foo()
{
int myvar = 5;
}
must change to
foo()
{
static int myvar;
myvar = 5;
}
Parameters
If you go around passing large lots of data down the call tree in parameters you will quickly run into the same 256 byte limitation. Your best option here may be to pass a pointer to a globally allocated struct/s of "options".Alternatively you can have global settings variables that are set by the top caller and read by callees down the tree. It really depends on the design of the software which approach is better.
I've struggled with the same issues as the OP and I think the best option in the long run is to move away from using the Hitech compiler. The optimisation decision the compiler writers took to allocate all locals/params in one block is only really appropriate for the very small ram size PICS. For large PICS you will run out of local/param far before you hit the ram size of the device. Then you have to start hacking your code around to fit the compiler which is perverse.
In summary... Yes your approach is valid. But do consider simply making locals static if that is appropriate as, in general, reducing the scope makes your code safer.
Whereas the C18 compiler used some FSRs (pointers) to manage the data stack, it sounds like the new XC8 compiler from Microchip uses a compiled stack, so you should know exactly how much space is taken up by the stack at compile time. You will also know exactly where each stack variable is stored. I read all about this in the XC8 user's guide and it sounds great. That feature should make this question be moot, assuming you are using XC8.
My experience with compilers/linkers for chips with limited memory is that, as long as you don't use recursive functions and inform the compiler about that, then the compiler is very capable of determining the minimal amount of stack-space that is needed.
I have even seen compilers that give each variable with automatic storage a globally fixed address (no stack at all), where several variables got allocated to overlapping memory, as long as their lifetimes did not overlap.
The general advise when doing (speed or space) optimisations is: make measurements to prove that your optimisation actually has a positive effect.
Since you are nearly out of memory, you have to count each byte of RAM. Using local variables (auto) allows to reuse the memory where you need it (local in the function). When you move the variables to global static address space, you give each variable a unique space. That's wast of address space.
The Microchip compiler allows that different variables share the same address. I don't have the docs at hand, but this can be done by pragma.
But what you need is a analysis of RAM requirements. When you see, that the stack cannot hold all variables but the auto variables would reduce the global memory use, you should consider to increase the stack size using startup code and the linker script.
Best practive is to choose a hardware that fits the requirements.
There are microcontrollers around the cost only some dollars more, but save hundereds or thousand of dollars development costs. If this is a hobby development your effort may not count. But in real world you can often find hardware that is designed only with view of hardware costs.
Especially the PIC18 is not the best example for compact code, what also can be a problem with the flash memory.
This migth sound obvious, but try not to use 16 bits variables on 8 bit precessors. 16 bits variables are fine and needed on bigger arquitectures, but in limited (8 bit) architectures a 16 bit aritmetic is a quick way for depleting both RAM and ROM memories in no time.
If you try to increment a 16 bits variable, the compiler would include a 16 bits increment library, that consumes in most cases a lot of space.
Also, try not to divide or multiply, as for some controllers they are software implemented.
Personally, I go alwais for char and when in need of a divide operation, use rotate rigth 'n' times to divide by 2 n times.
hope this helps!
A bit late, but you should also have a closer look at the C18 compiler user guide (if you were using this compiler).
You could decrease the stack dramatically by statically allocating local variables (overriding the auto keyword). Even better, you can use the overlay storage identifier, which allows different non-overlapping lifetimes variables to be placed at the same address, minimizing RAM. (C18 compiler must operate in Non-Extended mode).
Related
Like we do with macros:
#undef SOMEMACRO
Can we also undeclare or delete the variables in C, so that we can save a lot of memory?
I know about malloc() and free(), but I want to delete the variables completely so that if I use printf("%d", a); I should get error
test.c:4:14: error: ‘a’ undeclared (first use in this function)
No, but you can create small minimum scopes to achieve this since all scope local variables are destroyed when the scope is exit. Something like this:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}
It's not a direct answer to the question, but it might bring some order and understanding on why this question has no proper answer and why "deleting" variables is impossible in C.
Point #1 What are variables?
Variables are a way for a programmer to assign a name to a memory space. This is important, because this means that a variable doesn't have to occupy any actual space! As long as the compiler has a way to keep track of the memory in question, a defined variable could be translated in many ways to occupy no space at all.
Consider: const int i = 10; A compiler could easily choose to substitute all instances of i into an immediate value. i would occupy 0 data memory in this case (depending on architecture it could increase code size). Alternatively, the compiler could store the value in a register and again, no stack nor heap space will be used. There's no point in "undefining" a label that exists mostly in the code and not necessarily in runtime.
Point #2 Where are variables stored?
After point #1 you already understand that this is not an easy question to answer as the compiler could do anything it wants without breaking your logic, but generally speaking, variables are stored on the stack. How the stack works is quite important for your question.
When a function is being called the machine takes the current location of the CPU's instruction pointer and the current stack pointer and pushes them into the stack, replacing the stack pointer to the next location on stack. It then jumps into the code of the function being called.
That function knows how many variables it has and how much space they need, so it moves the frame pointer to capture a frame that could occupy all the function's variables and then just uses stack. To simplify things, the function captures enough space for all it's variables right from the start and each variable has a well defined offset from the beginning of the function's stack frame*. The variables are also stored one after the other.
While you could manipulate the frame pointer after this action, it'll be too costly and mostly pointless - The running code only uses the last stack frame and could occupy all remaining stack if needed (stack is allocated at thread start) so "releasing" variables gives little benefit. Releasing a variable from the middle of the stack frame would require a defrag operation which would be very CPU costly and pointless to recover few bytes of memory.
Point #3: Let the compiler do its job
The last issue here is the simple fact that a compiler could do a much better job at optimizing your program than you probably could. Given the need, the compiler could detect variable scopes and overlap memory which can't be accessed simultaneously to reduce the programs memory consumption (-O3 compile flag).
There's no need for you to "release" variables since the compiler could do that without your knowledge anyway.
This is to complement all said before me about the variables being too small to matter and the fact that there's no mechanism to achieve what you asked.
* Languages that support dynamic-sized arrays could alter the stack frame to allocate space for that array only after the size of the array was calculated.
There is no way to do that in C nor in the vast majority of programming languages, certainly in all programming languages that I know.
And you would not save "a lot of memory". The amount of memory you would save if you did such a thing would be minuscule. Tiny. Not worth talking about.
The mechanism that would facilitate the purging of variables in such a way would probably occupy more memory than the variables you would purge.
The invocation of the code that would reclaim the code of individual variables would also occupy more space than the variables themselves.
So if there was a magic method purge() that purges variables, not only the implementation of purge() would be larger than any amount of memory you would ever hope to reclaim by purging variables in your program, but also, in int a; purge(a); the call to purge() would occupy more space than a itself.
That's because the variables that you are talking about are very small. The printf("%d", a); example that you provided shows that you are thinking of somehow reclaiming the memory occupied by individual int variables. Even if there was a way to do that, you would be saving something of the order of 4 bytes. The total amount of memory occupied by such variables is extremely small, because it is a direct function of how many variables you, as a programmer, declare by hand-typing their declarations. It would take years of typing on a keyboard doing nothing but mindlessly declaring variables before you would declare a number of int variables occupying an amount of memory worth speaking of.
Well, you can use blocks ({ }) and defining a variable as late as possible to limit the scope where it exists.
But unless the variable's address is taken, doing so has no influence on the generated code at all, as the compiler's determination of the scope where it has to keep the variable's value is not significantly impacted.
If the variable's address is taken, failure of escape-analysis, mostly due to inlining-barriers like separate compilation or allowing semantic interpositioning, can make the compiler assume it has to keep it alive till later in the block than strictly neccessary. That's rarely significant (don't worry about a handful of ints, and most often a few lines of code longer keeping it alive are insignificant), but best to keep it in mind for the rare case where it might matter.
If you are that concerned about the tiny amount of memory that is on the stack, then you're probably going to be interested in understanding the specifics of your compiler as well. You'll need to find out what it does when it compiles. The actual shape of the stack-frame is not specified by the C language. It is left to the compiler to figure out. To take an example from the currently accepted answer:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}
This may or may not affect the memory usage of the function. If you were to do this in a mainstream compiler like gcc or Visual Studio, you would find that they optimize for speed rather than stack size, so they pre-allocate all of the stack space they need at the start of the function. They will do analysis to figure out the minimum pre-allocation needed, using your scoping and variable-usage analysis, but those algorithms literally wont' be affected by extra scoping. They're already smarter than that.
Other compilers, especially those for embedded platforms, may allocate the stack frame differently. On these platforms, such scoping may be the trick you needed. How do you tell the difference? The only options are:
Read the documentation
Try it, and see what works
Also, make sure you understand the exact nature of your problem. I worked on a particular embedded project which eschewed the stack for everything except return values and a few ints. When I pressed the senior developers about this silliness, they explained that on this particular application, stack space was at more of a premium than space for globally allocated variables. They had a process they had to go through to prove that the system would operate as intended, and this process was much easier for them if they allocated everything up front and avoided recursion. I guarantee you would never arrive at such a convoluted solution unless you first knew the exact nature of what you were solving.
As another solution you could look at, you could always build your own stack frames. Make a union of structs, where each struct contains the variables for one stack frame. Then keep track of them yourself. You could also look at functions like alloca, which can allow for growing the stack frame during the function call, if your compiler supports it.
Would a union of structs work? Try it. The answer is compiler dependent. If all variables are stored in memory on your particular device, then this approach will likely minimize stack usage. However, it could also substantially confuse register coloring algorithms, and result in an increase in stack usage! Try and see how it goes for you!
All,
I have C functions that are called many times a second as they are part of a control loop on a PIC18 board. These functions have variables that only need method scope, but I was wondering what if any overhead there was to constantly allocating these variables vs. using a global or at least higher scoped variable. (Thought of typedef'ing a struct to pass around from a higher scope to avoid global variable use if performance dictates not using method local varables)
There are some good threads on here that cover this topic, but I have yet to see a definitive answer as most preach best practices which I agree and would follow as long as there are not performance gains to be had as every microsecond counts.
One thread mentioned using file scoped static variables as a substitute for global variables, but I can't help wonder if even that is necessary.
What does everyone think?
Accessing a local variable requires doing something like *(SP + offset) (where SP is the stack-pointer), whereas accessing a static (which includes globals) requires something like *(address).
From what I recall, the PIC instruction set has very limited addressing modes. So it's very likely that accessing the global will be faster, at least for the first time it's accessed. Subsequent accesses may be identical if the compiler holds the computed address in a register.
As #unwind said in the comments, you should take a look at the compiler output, and profile to confirm. I would only sacrifice clarity/maintainability if you've proved that it's worthwhile in terms of the runtime of your program.
While I've not used every single PIC compiler in existence, there are two styles. The style I've used allocates all local variables statically by analyzing the program's call graph. If every possible call were in fact performed, the amount of stack memory consumed by locals would match what would be required by static allocation, with a couple of caveats (describing the behavior of HiTech's PICC-18 "standard" compiler--others may vary)
Variadic functions are handled by defining local-variable storage in the scope of the caller, and passing a two-byte pointer to that storage to the function being called.
For every different signature of indirect function pointer, the compiler generates a "pseudo-function" in the call graph; everything that calls a function of that signature calls the pseudo-function, and that pseudo-function calls every function with that signature that has its address taken.
In this style of compiler, consecutive accesses to local variables will be just as fast as consecutive accesses to globals. Other than global and static variables explicitly-declared as "near", however, which must total no more than 64-128 bytes (varies with different models of PIC), the global and static variables for each module are located separately from local variables, and bank-switching instructions are needed to access things in different banks.
Some compilers which I have not used employ the "enhanced instruction set" option. This option gobbles up 96 bytes of the "near" bank (or all of it, on PICs with less than 96 bytes) and uses it to access 96 bytes relative to the FSR2 register. This would be a wonderful concept if it used the first 16, or maybe 32, bytes as a stack frame. Using 96 bytes means giving up all of the "near" storage, which is a pretty severe limitation. Nonetheless, compilers which use this instruction set can access local variables on a stack just as fast, if not faster, than global variables (no bank-switch required). I really wish Microchip had an option to only set aside 16 bytes or so for the stack frame, leaving a useful amount of 'common bank' RAM, but nonetheless some people have good luck with that mode.
I would imagine that this depends a lot on which compiler you are using. I don't know PIC but I'm guessing some (all?) PIC compilers will optimize the code so that local variables are stored in CPU registers whenever possible. If so, then local variables will likely be equally fast as globals.
Otherwise if the local variable is allocated on the stack the global may be a bit faster to access (see Oli's answer).
Is dividing the work into 5 functions as opposed to one big function more memory efficient in C since at a given time there are fewer variables in memory, as the stack-frame gets deallocated more often? Does it depend on the compiler, and optimization? if so in what compilers is it faster?
Answer given there are a lot of local variables and the stack frames comes from a centralized main and not created on the top of each other.
I know other advantages of breaking out the function into smaller functions. Please answer this question, only in respect to memory usage.
It might reduce "high water mark" of stack usage for your program, and if so that might reduce the overall memory requirement of the program.
Yes, it depends on optimization. If the optimizer inlines the function calls, you might well find that all the variables of all the functions inlined are wrapped into one big stack frame. Any compiler worth using is capable of inlining[*], so the fact that it can happen doesn't depend on compiler. Exactly when it happens, will differ.
If your local variables are small, though, then it's fairly rare for your program to use more stack than has been automatically allocated to you at startup. Unless you go past what you're given initially, how much you use makes no difference to overall memory requirements.
If you're putting great big structures on the stack (multiple kilobytes), or if you're on a machine where a kilobyte is a lot of memory, then it might make a difference to overall memory usage. So, if by "a lot of local variables" you mean few dozen ints and pointers then no, nothing you do makes any significant difference. If by "a lot of local variables" you mean a few dozen 10k buffers, or if your function recurses very deep so that you have hundreds of levels of your few dozen ints, then it's a least possible it could make a difference, depending on the OS and configuration.
The model that stack and heap grow towards each other through general RAM, and the free memory in the middle can be used equally by either one of them, is obsolete. With the exception of a very few, very restricted systems, memory models are not designed that way any more. In modern OSes, we have so-called "virtual memory", and stack space is allocated to your program one page at a time. Most of them automatically allocate more pages of stack as it is used, up to a configured limit that's usually very large. A few don't automatically extend stack (Symbian last I used it, which was some years ago, didn't, although arguably Symbian is not a "modern" OS). If you're using an embedded OS, check what the manual says about stack.
Either way, the only thing that affects total memory use is how many pages of stack you need at any one time. If your system automatically extends stack, you won't even notice how much you're using. If it doesn't, you'll need to ensure that the program is given sufficient stack for its high-water mark, and that's when you might notice excessive stack use.
In short, this is one of those things that in theory makes a difference, but in practice that difference is almost always insignificant. It only matters if your program uses massive amounts of stack relative to the resources of the environment it runs in.
[*] People programming in C for PICs or something, using a C compiler that is basically a non-optimizing assembler, are allowed to be offended that I've called their compiler "not worth using". The stack on such devices is so different from "typical" systems that the answer is different anyway.
I think in most cases the area of memory allocated for the stack (for the entire program) remains constant. The amount in use will change based on the depth of call stack and that amount would be less when fewer variables are used (but note that function calls push the return address and stack pointer also).
Also it depends on how the functions are called. If two functions are called in series, for example, and the stack of the first is popped before the call to the second, then you'll be using less of the stack..but if the first function calls the second then you're back to where you were with one big function (plus the function call overhead).
There's no memory allocation on stack - just moving the stack pointer towards next value. While stack size itself is predefined. So there's no difference in memory usage (apart of situations when you get stack overflow).
Yes, in the same vein that using a finer coat of paint on a jet plane increases its aerodynamic properties. Ok, that's a bad analogy, but the point is that if there is ever a question of making things clear and telegraphic or trying to use more functions, go with telegraphic. In most cases these are not mutually exclusive anyway as the beginners tend to give subroutines or functions too much to do.
In terms of memory I think that if you are truly splitting up up work (f, then g, then h) then you will see some minute available memory increases but if these are interdependent then you will not.
As #Joel Burget says, memory management is not really a consideration in code structuring.
Just my take.
Splitting a huge function into smaller ones does have its benefits, among them is potentially more optimized memory usage.
Say, you have this function.
void huge_func(int input) {
char a[1024];
char b[1024];
// do something with input and a
// do something with input and b
}
And you split it to two.
void func_a(int input) {
char a[1024];
// do something with input and a
}
void func_b(int input) {
char b[1024];
// do something with input and b
}
Calling huge_func will take at least 2048 bytes of memory, and calling func_a then func_b achieves the same outcome with about half less memory. However, if inside func_a you call func_b, the amount of memory used is about the same as huge_func. Essentially, as what #sje397 wrote.
I might be wrong to say this but I do not think there is any compiler optimization that could help you reduce the usage of stack memory. I believe the layout of stack memory must ensure that sufficient memory is reserved for all declared variables, whether used or not.
My function will be called thousands of times. If i want to make it faster, will changing the local function variables to static be of any use? My logic behind this is that, because static variables are persistent between function calls, they are allocated only the first time, and thus, every subsequent call will not allocate memory for them and will become faster, because the memory allocation step is not done.
Also, if the above is true, then would using global variables instead of parameters be faster to pass information to the function every time it is called? i think space for parameters is also allocated on every function call, to allow for recursion (that's why recursion uses up more memory), but since my function is not recursive, and if my reasoning is correct, then taking off parameters will in theory make it faster.
I know these things I want to do are horrible programming habits, but please, tell me if it is wise. I am going to try it anyway but please give me your opinion.
The overhead of local variables is zero. Each time you call a function, you are already setting up the stack for the parameters, return values, etc. Adding local variables means that you're adding a slightly bigger number to the stack pointer (a number which is computed at compile time).
Also, local variables are probably faster due to cache locality.
If you are only calling your function "thousands" of times (not millions or billions), then you should be looking at your algorithm for optimization opportunities after you have run a profiler.
Re: cache locality (read more here):
Frequently accessed global variables probably have temporal locality. They also may be copied to a register during function execution, but will be written back into memory (cache) after a function returns (otherwise they wouldn't be accessible to anything else; registers don't have addresses).
Local variables will generally have both temporal and spatial locality (they get that by virtue of being created on the stack). Additionally, they may be "allocated" directly to registers and never be written to memory.
The best way to find out is to actually run a profiler. This can be as simple as executing several timed tests using both methods and then averaging out the results and comparing, or you may consider a full-blown profiling tool which attaches itself to a process and graphs out memory use over time and execution speed.
Do not perform random micro code-tuning because you have a gut feeling it will be faster. Compilers all have slightly different implementations of things and what is true on one compiler on one environment may be false on another configuration.
To tackle that comment about fewer parameters: the process of "inlining" functions essentially removes the overhead related to calling a function. Chances are a small function will be automatically in-lined by the compiler, but you can suggest a function be inlined as well.
In a different language, C++, the new standard coming out supports perfect forwarding, and perfect move semantics with rvalue references which removes the need for temporaries in certain cases which can reduce the cost of calling a function.
I suspect you're prematurely optimizing, however, you should not be this concerned with performance until you've discovered your real bottlenecks.
Absolutly not! The only "performance" difference is when variables are initialised
int anint = 42;
vs
static int anint = 42;
In the first case the integer will be set to 42 every time the function is called in the second case ot will be set to 42 when the program is loaded.
However the difference is so trivial as to be barely noticable. Its a common misconception that storage has to be allocated for "automatic" variables on every call. This is not so C uses the already allocated space in the stack for these variables.
Static variables may actually slow you down as its some aggresive optimisations are not possible on static variables. Also as locals are in a contiguous area of the stack they are easier to cache efficiently.
There is no one answer to this. It will vary with the CPU, the compiler, the compiler flags, the number of local variables you have, what the CPU's been doing before you call the function, and quite possibly the phase of the moon.
Consider two extremes; if you have only one or a few local variables, it/they might easily be stored in registers rather than be allocated memory locations at all. If register "pressure" is sufficiently low that this may happen without executing any instructions at all.
At the opposite extreme there are a few machines (e.g., IBM mainframes) that don't have stacks at all. In this case, what we'd normally think of as stack frames are actually allocated as a linked list on the heap. As you'd probably guess, this can be quite slow.
When it comes to accessing the variables, the situation's somewhat similar -- access to a machine register is pretty well guaranteed to be faster than anything allocated in memory can possible hope for. OTOH, it's possible for access to variables on the stack to be pretty slow -- it normally requires something like an indexed indirect access, which (especially with older CPUs) tends to be fairly slow. OTOH, access to a global (which a static is, even though its name isn't globally visible) typically requires forming an absolute address, which some CPUs penalize to some degree as well.
Bottom line: even the advice to profile your code may be misplaced -- the difference may easily be so tiny that even a profiler won't detect it dependably, and the only way to be sure is to examine the assembly language that's produced (and spend a few years learning assembly language well enough to know say anything when you do look at it). The other side of this is that when you're dealing with a difference you can't even measure dependably, the chances that it'll have a material effect on the speed of real code is so remote that it's probably not worth the trouble.
It looks like the static vs non-static has been completely covered but on the topic of global variables. Often these will slow down a programs execution rather than speed it up.
The reason is that tightly scoped variables make it easy for the compiler to heavily optimise, if the compiler has to look all over your application for instances of where a global might be used then its optimising won't be as good.
This is compounded when you introduce pointers, say you have the following code:
int myFunction()
{
SomeStruct *A, *B;
FillOutSomeStruct(B);
memcpy(A, B, sizeof(A);
return A.result;
}
the compiler knows that the pointer A and B can never overlap and so it can optimise the copy. If A and B are global then they could possibly point to overlapping or identical memory, this means the compiler must 'play it safe' which is slower. The problem is generally called 'pointer aliasing' and can occur in lots of situations not just memory copies.
http://en.wikipedia.org/wiki/Pointer_alias
Using static variables may make a function a tiny bit faster. However, this will cause problems if you ever want to make your program multi-threaded. Since static variables are shared between function invocations, invoking the function simultaneously in different threads will result in undefined behaviour. Multi-threading is the type of thing you may want to do in the future to really speed up your code.
Most of the things you mentioned are referred to as micro-optimizations. Generally, worrying about these kind of things is a bad idea. It makes your code harder to read, and harder to maintain. It's also highly likely to introduce bugs. You'll likely get more bang for your buck doing optimizations at a higher level.
As M2tM suggests, running a profiler is also a good idea. Check out gprof for one which is quite easy to use.
You can always time your application to truly determine what is fastest. Here is what I understand: (all of this depends on the architecture of your processor, btw)
C functions create a stack frame, which is where passed parameters are put, and local variables are put, as well as the return pointer back to where the caller called the function. There is no memory management allocation here. It usually a simple pointer movement and thats it. Accessing data off the stack is also pretty quick. Penalties usually come into play when you're dealing with pointers.
As for global or static variables, they're the same...from the standpoint that they're going to be allocated in the same region of memory. Accessing these may use a different method of access than local variables, depends on the compiler.
The major difference between your scenarios is memory footprint, not so much speed.
Using static variables can actually make your code significantly slower. Static variables must exist in a 'data' region of memory. In order to use that variable, the function must execute a load instruction to read from main memory, or a store instruction to write to it. If that region is not in the cache, you lose many cycles. A local variable that lives on the stack will most surely have an address that is in the cache, and might even be in a cpu register, never appearing in memory at all.
I agree with the others comments about profiling to find out stuff like that, but generally speaking, function static variables should be slower. If you want them, what you are really after is a global. Function statics insert code/data to check if the thing has been initialized already that gets run every time your function is called.
Profiling may not see the difference, disassembling and knowing what to look for might.
I suspect you are only going to get a variation as much as a few clock cycles per loop (on average depending on the compiler, etc). Sometimes the change will be dramatic improvement or dramatically slower, and that wont necessarily be because the variables home has moved to/from the stack. Lets say you save four clock cycles per function call for 10000 calls on a 2ghz processor. Very rough calculation: 20 microseconds saved. Is 20 microseconds a lot or a little compared to your current execution time?
You will likely get more a performance improvement by making all of your char and short variables into ints, among other things. Micro-optimization is a good thing to know but takes lots of time experimenting, disassembling, timing the execution of your code, understanding that fewer instructions does not necessarily mean faster for example.
Take your specific program, disassemble both the function in question and the code that calls it. With and without the static. If you gain only one or two instructions and this is the only optimization you are going to do, it is probably not worth it. You may not be able to see the difference while profiling. Changes in where the cache lines hit could show up in profiling before changes in the code for example.
I am working on an embedded system, so memory is precious for me.
One issue that has been recurring is that I've been running out of memory space when attempting to compile a program for it. This is usually fixed by limiting the number of typedefs, etc that can take up a lot of space.
There is a macro generator that I use to create a file with a lot of #define's in it.
Some of these are simple values, others are boundary checks
ie
#define SIGNAL1 (float)0.03f
#define SIGNAL1_ISVALID(value) ((value >= 0.0f) && (value <= 10.0f))
Now, I don't use all of these defines. I use some, but not actually the majority.
I have been told that they don't actually take up any memory if they are not used, but I was unsure on this point. I'm hoping that by cutting out the unused ones that I can free up some extra memory (but again, I was told this is pointless).
Do unused #define's take up any memory space?
No, #defines take up no space unless they are used - #defines work like find/replace; whenever the compiler sees the left half, it'll replace it with the right half before it actually compiles.
So, if you have:
float f = SIGNAL1;
The compiler will literally interpret the statement:
float f = (float)0.03f;
It will never see the SIGNAL1, it won't show up in a debugger, etc.
This is usually fixed by limiting the number of typedefs, etc that can take up a lot of space.
You seem somewhat confused because typedef's do not take up space at runtime. They are merely aliases for data types. Now you may have instances of large structures (typedef'd or otherwise), but it is the instance that takes space, not the type definition. I wonder what 'etc' might cover in this statement.
Macro instances are replaced in the source code with their definition, and code generated accordingly, an unused macro does not result in any generated code.
Things that take up space are:
Executable code (functions/member functions)
Data instantiation (including C++ object instances)
The amount of space allocated to the stack (or stacks in a multi-threaded system).
What is left is typically available for dynamic memory allocation (RAM), or is unused or made for non-volatile storage (Flash/EPROM).
Reducing memory usage is primarily a case of selecting/designing efficient data structures, using appropriate data types, and efficient code and algorithm design. It is best to target the area that will get the greatest benefit. To see the size of objects and code in your application, get the linker to generate a map file. That will tell you which are the largest functions, as well as the sizes of global and static objects.
Source file text length is not a good guide to code size. Large amounts of C code is declarative (typically header files are all declarative), and do not generate memory occupying code or data.
An embedded system does not necessarily imply small memory, so you should specify. I have worked on systems with 64Mb RAM and 2Mb Flash, and even that is modest compared with many systems. A typical micro-controller with on-chip resources however, will generally have much less (especially SRAM which takes up a lot of chip area). Also whether your system is Harvard or Von Neumann architecture is relevant here, since in a Harvard architecture data and code spaces are separate, so we need to know what it is you are short of. If Von Neumann, the code/data usage is still relevant if the code is running from ROM, or is it is copied from ROM to RAM at run-time (i.e. different types of memory, even if they are in the same address space).
Clifford
Well, yes and no.
No, unused #defines won't increase the size of the resulting binary.
Yes, all #defines (whether used or unused) must be known by the compiler when building the binary.
By your question it's a bit ambiguous how you use the compiler, but it almost seems that you try to build directly on an embedded device; have you tried a cross-compiler? :)
unused #defines doesn't take up space in the resulting executable. They do take up memory in the compiler itself whist compiling.