C performance on a PIC board global variables vs. method local - c

All,
I have C functions that are called many times a second as they are part of a control loop on a PIC18 board. These functions have variables that only need method scope, but I was wondering what if any overhead there was to constantly allocating these variables vs. using a global or at least higher scoped variable. (Thought of typedef'ing a struct to pass around from a higher scope to avoid global variable use if performance dictates not using method local varables)
There are some good threads on here that cover this topic, but I have yet to see a definitive answer as most preach best practices which I agree and would follow as long as there are not performance gains to be had as every microsecond counts.
One thread mentioned using file scoped static variables as a substitute for global variables, but I can't help wonder if even that is necessary.
What does everyone think?

Accessing a local variable requires doing something like *(SP + offset) (where SP is the stack-pointer), whereas accessing a static (which includes globals) requires something like *(address).
From what I recall, the PIC instruction set has very limited addressing modes. So it's very likely that accessing the global will be faster, at least for the first time it's accessed. Subsequent accesses may be identical if the compiler holds the computed address in a register.
As #unwind said in the comments, you should take a look at the compiler output, and profile to confirm. I would only sacrifice clarity/maintainability if you've proved that it's worthwhile in terms of the runtime of your program.

While I've not used every single PIC compiler in existence, there are two styles. The style I've used allocates all local variables statically by analyzing the program's call graph. If every possible call were in fact performed, the amount of stack memory consumed by locals would match what would be required by static allocation, with a couple of caveats (describing the behavior of HiTech's PICC-18 "standard" compiler--others may vary)
Variadic functions are handled by defining local-variable storage in the scope of the caller, and passing a two-byte pointer to that storage to the function being called.
For every different signature of indirect function pointer, the compiler generates a "pseudo-function" in the call graph; everything that calls a function of that signature calls the pseudo-function, and that pseudo-function calls every function with that signature that has its address taken.
In this style of compiler, consecutive accesses to local variables will be just as fast as consecutive accesses to globals. Other than global and static variables explicitly-declared as "near", however, which must total no more than 64-128 bytes (varies with different models of PIC), the global and static variables for each module are located separately from local variables, and bank-switching instructions are needed to access things in different banks.
Some compilers which I have not used employ the "enhanced instruction set" option. This option gobbles up 96 bytes of the "near" bank (or all of it, on PICs with less than 96 bytes) and uses it to access 96 bytes relative to the FSR2 register. This would be a wonderful concept if it used the first 16, or maybe 32, bytes as a stack frame. Using 96 bytes means giving up all of the "near" storage, which is a pretty severe limitation. Nonetheless, compilers which use this instruction set can access local variables on a stack just as fast, if not faster, than global variables (no bank-switch required). I really wish Microchip had an option to only set aside 16 bytes or so for the stack frame, leaving a useful amount of 'common bank' RAM, but nonetheless some people have good luck with that mode.

I would imagine that this depends a lot on which compiler you are using. I don't know PIC but I'm guessing some (all?) PIC compilers will optimize the code so that local variables are stored in CPU registers whenever possible. If so, then local variables will likely be equally fast as globals.
Otherwise if the local variable is allocated on the stack the global may be a bit faster to access (see Oli's answer).

Related

Difference between global variables and variables declared in main in ARM C

I've been trying to use C in Keil to write some test code for my TM4C123G, which uses an ARM microcontroller. I have no clue about ARM assembly, but I have written some assembly code for an AVR microcontroller in the past.
Where are the values of the variables stored, if we declare a variable in C as global, as opposed to declaring it in main?
Are there general guidelines as to whether we should declare a variable global as opposed to in main (when it comes to writing C for a microcontroller) ?
Globals in ARM cause an offset to be placed in the calling function's "pool" area. In general, each global needs its own offset. If you do decided to use globals, place them into a single structure so that all the variables can be accessed via a singel pool offset.
Variables defined at the function level live on the stack. These, within reason, can be accessed with a simpler offset from the stack register and tend to be a touch more efficient opcode wise, and perhaps cache wise, depending on what kind of system you are using.
When to use either? Outside of the global vs. local holy wars of maintainability, it comes down to what else your code wants to do. If some variable is being used in a ton of places and you don't want to pass it around as a parameter (a sign that perhaps your design needs work...) then a global would be the way to go. If you need to access this variable across multiple assembler packages (that desing thing again...) then perhaps a global is also fine.
Most of the time though, I would go with local variables. They are much more thread safe, and from a above, tend to be more efficient. They are easier to maintain as well, because there scope is much more confined - which also helps compilers do there jobs better as well.
The compiler will put global variables in the "data" or "bss" memory segment. This allocation is permanent, so the variable will never lose its value. Function-local variables are allocated dynamically on the stack, and disapears when the function returns.
Use global variables when the variable must keep its value between function calls, and must be accessible by code in multiple functions and/or files. Use function-local variables for everything else.
There are also "static" variables. They function the same way as global variables, but have a more limited namespace. They are used as file-local variables and function-local variables which keep their value between function calls.
Globals are fine they have a place especially in embedded things like microcontrollers where you are extremely tight on resources. Globals make it easy to manage your resources where locals are very dynamic and difficult at best to not get into trouble.
But... As far as assembly goes, there are no rules, there is not really a notion of local versus global that is strictly a higher level language thing. Now there may be implementations of assembly languages that allow/force such things, understand that C is a standard (as others) that cross platforms and are not specific to one. But assembly not only does not have a standard, but is also processor specific. the assembly language is defined by the assembler (the program that parses it and turns it into machine code). And anyone and their brother can bang out an assembler and make up whatever language or rules they want.
In general if you are trying to hand implement C code in assembly instead of having the compiler do it (or instead of just having the compiler do it and at least see what it does). Your globals unless optimized out are going to get a home in ram. Locals may or may not depending on optimization get a place on the stack, they may just live temporarily in registers, depends on the processor number of registers available the trade off between preserving registers that contained something else to preserving the local in favor of keeping the something else in a register.
You should just take some simple functions (not necessarily complete programs) and see what the compiler does.
if you want to write in pure assembly and not have the notion of converting C you would still have the same dilemma of do I keep something in a register for a long time, do I put it on the stack for the duration of this chunk of code or do I assign it an address where it lives forever.
I suggest you be open minded and try to understand the whys and why nots, a lot of these kinds of rules, no globals, small functions, etc are preached not because of facts but just because of faith, someone I trust told me so I preach it as well, not always but sometimes you can dig in and find that the fear is real or the fear is not real, or maybe it applied 30 years ago but not anymore, and so on. An alternative to a global for example is a local at a main() or top level that you keep passing down as you nest in, which basically means it is a global from a resource perspective. In fact depending on the compiler (an extremely popular one in particular) that one main level local that is passed down, actually consumes resources at each nesting level consuming a significantly higher quantity of ram than if it had just been declared a global. The other side if it is not inefficient memory consumption but access, who can mess with that variable and who cant, locals make that quite easy, a fair amount of laziness, you can be messy. Have to take care with globals to not mess them up. Note static locals are also globals from a resource perspective they sit in the same .data space for the duration of the program.
There is a reason, many reasons why C cannot die. Nothing has come along that can replace it. There is a reason why it is basically the first compiler for each new processor/instruction set.
Write some simple functions, compile them, disassemble, see what is produced. take some of your embedded/bare metal applications for these platforms, disassemble, see what the compiler has done. Globals and static locals that cant be optimized out get an address in .data. Sometimes some or all of the incoming parameters get stack locations and sometimes some or all of the local variables consume stack as well, has to do with the compiler and optimization if you chose to optimize. Also depends on the architecture a CISC that can do memory and register or memory to memory operations doesnt have to move stuff in and out of registers all the time RISC often does have to exclusively use registers or often use registers, but also often has a lot more available. So the compiler knows that and manages the home for the variables as a result.
Everyone is going to put globals and static locals in ram (unless they can optimize out). traditional x86 brings the parameters in on the stack anyway and is going to put the locals there too and just access them there. a mips or arm is going to try to minimize the amount of ram used, and lean on registers, trying to keep some of the variables in registers exclusively and not consume stack.
An assembly programmer, pre-compilers, used a lot of the equivalent of globals, pick an address for each variable. now post compilers, you can choose to just do it that way setup all of your variables and hardly use the stack other than returning from calls, or you can program as if you were a compiler and make some rules or simply follow the compilers convention and have some disposable registers and others preserved and lean on the stack for preservation and local to the function items.
At the end of the day though assembly does not have a notion of global vs local any more than it has a notion of signed vs unsigned vs pointer vs array or anything like that that a high level language would have.

Optimization for global and static variables

I read some topics over optimization and it is mentioned that global variables can not be stored in registers and hence if we need to optimize we use register variable to store the global data and modify that register variable. Is this applies to static variables too?
For auto storage, what if we store auto variables in register variables? Won't it faster the access from register instead of stack?
Both global variables and static variables exist in the data segment, which includes the data, BSS, and heap sections. If the static variable is initialized to 0 or not initialized to anything, it goes in the BSS section. If it is given a non-zero initialization value, then it is in the "data" section. See:
http://en.wikipedia.org/wiki/Data_segment
As for auto vs. register variables: register does not guarantee that the variable will be stored in a register, it is more providing a hint from the programmer. See:
http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/CONCEPT/storage_class.html
Yes, it is (much) faster to access a register than to access the stack memory, but this optimization nowadays is left up to the compiler (the problem of register allocation) as well as the CPU architecture (which has a great many optimizations too complex to explain here).
Unless you're programming for a really simple or old architecture and/or using a really outdated compiler, you probably should not worry about this kind of optimization.
global variables' values can be held in registers for so long as the compiler can prove there is no other access to the stored value. With values that can't be held in a register themselves, declaring a pointer with the restrict keyword declares that a value isn't being accessed via any other means for that pointer's lifetime; just don't give away any copies and the compiler will take care of the rest. For scalars declaring thistype localval=globalval; works at least as well if you're not changing the value or you've got good control over scope exits -- or even better.
You can only use the restrict declaration if the value really won't be accessed otherwise. Optimizers these days can for instance deduce from your declaring the object won't be accessed in one function that a code path that does access it in another won't be executed, and from that deduce the content of the expression used to take that code path, and so on. "If you lie to the compiler, it will have its revenge" is more true today than ever.

C variable allocation time and space

If i have a test.c file with the following
#include ...
int global = 0;
int main() {
int local1 = 0;
while(1) {
int local2 = 0;
// Do some operation with one of them
}
return 0;
}
So if I had to use one of this variables in the while loop, which one would be preferred?
Maybe I'm being a little vague here, but I want to know if the difference in time/space allocation is actually relevant.
If you are wondering whether declaring a variable inside a for loop causes it to be created/destroyed at every iteration, there is nothing really to worry about. These variables are not dynamically allocated at runtime, nothing is being malloced here - just some memory is being set aside for use inside the loop. So having the variable inside is just the same as having it outside the loop in terms of performance.
The real difference here is scope not performance. Whether you use a global or local variable only affects where you want this variable to be visible.
In case you're wondering about performance differences: most likely there aren't any. If there are theoretical performance differences, you'll find it hard to actually devise a test to measure them.
A decision like this should not be based on performance but semantics. Unless the semantic behavior of a global variable is required, you should always use automatic (local non-static) variables.
As others have said and surely will say, there are unlikely to be any differences in performance. If there are, the automatic variable will be faster.
The C compiler will have an easier time making optimizations on the variables declared local to the function. The global variable would require an optimizer to perform "Inter-Procedural Data Flow Analysis", which isn't that commonly done.
As an example of the difference, consider that all your declarations initialize the variable to zero. However, in the case of the global variable, the compiler cannot use that information unless it verifies that no flow of control in your program can change the global prior to using it in your example function. In the case of the locally declared ("automatic") variables, there is no way the initial value can be changed by another function (in particular, the compiler verifies that their address is never passed to a sub-function) and the compiler can perform "killed definitions" and "value liveness" analysis to determine whether the zero value can be assumed in some code paths.
Of the two local variables, as a guideline, the optimizer will always have an easier time optimizing access to the variable with the smaller (more limited) scope.
Having stated the above, I would suggest that other answers concerning a bias toward semantics over optimizer-meta-optimization is correct. Use the variable which causes the code to read best, and you will be rewarded with more time returned to you than assisting the def-use optimization calculation.
In general, avoid using a global variable, or any variable which can be accessed more broadly than absolutely necessary. Limited scoping of variables helps prevent bugs from being introduced during later program maintenance.
There are three broad classes of variables: static (global), stack (auto), and register.
Register variables are stored in CPU registers. Registers are very fast word-sized memories, which are integrated in the CPU pipeline. They are free to access, but there are a very limited number of them (typically between 8 and 32 depending on your processor and what operations you're doing).
Stack variables are stored in an area of RAM called the stack. The stack is almost always going to be in the cache, so stack variables typically take 1-4 cycles to access.
Generally, local variables can be either in registers or on the stack. It doesn't matter whether they are allocated at the top of a function or in a loop; they will only be allocated once per function call, and allocation is basically free. The compiler will put variables in registers if at all possible, but if you have more active variables than registers, they won't all fit. Also, if you take the address of a variable, it must be stored on the stack since registers don't have addresses.
Global and static variables are a different beast. Since they are not usually accessed frequently, they may not be in cache, so it could take hundreds of cycles to access them. Also, since the compiler may not know the address of a global variable ahead of time, it may need to be looked up, which is also expensive.
As others have said, don't worry too much about this stuff. It's definitely good to know, but it shouldn't affect the way you write your programs. Write code that makes sense, and let the compiler worry about optimization. If you get into compiler development, then you can start worrying about it. :)
Edit: more details on allocation:
Register variables are allocated by the compiler, so there is no runtime cost. The code will just put a value in a register as soon as the value is produced.
Stack variables are allocated by your program at runtime. Typically, when a function is called, the first thing it will do is reserve enough stack space for all of its local variables. So there is no per-variable cost.

Best Practices for PIC18 Stack/Memory Management?

The limited stack size of budget PICs is a problem area and I have adjusted my code to accommodate this reality. I currently adopt a rough paradigm of grouping closely related functions into a module and declaring all variables global static in the module (to reduce the amount of variables stored in the auto psect, and issues of mutability are only relevant in ISRs, which I account for.) I don't do this because it is good practice, but the reality is you have a finite amount of space to allocate all local function vars that exist in an entire project. In the embedded world of 8/16 bit chips, is this an appropriate method, provided I'm sure to take necessary precautions? I also do things like allocate > 256 bytes of RAM for Ethernet (I know it should be 1500 as standard MTU, but we have a custom situation and very limited RAM) buffers and have to access that memory via pointers so I can avoid the semantics of memory banking. Am I doing it wrong? My app works, but I am 100% open to suggestions for improvement. [c]
I know this was asked 4 years ago but it still has not been properly answered. I believe what the OP is asking is is their approach to working around a limitation of the HiTech PICC18 C compiler valid and/or best practice. As mentioned in a later comment the limitation (a rather bad one and not well advertised by Hitech) is "the Hi-Tech compiler only allows up 256 bytes of auto variables". Actually the limitation is worse than that as it is a total of 256 bytes for local variables and parameters. The linker warning when this is exceeded is pretty cryptic too. Provided that functions are on different branches of the call tree then the compiler can overlap the variables to reuse the space. This means that you can effectively have more than 256 bytes. But note that the interrupt handler (or handlers if you use the priority scheme) has it's own call tree that shares the 256 byte local/param block.
Locals
The two solutions to reduce the space required for locals are: make the locals global or make them static. Making them static keeps the scope the same and provided the function is not called from interrupts is safe (rentrancy is not allowed by the compiler anyway). This is probably the preferred option. The drawback is that the compiler can not reuse those variable's locations to reduce overall memory consumption. Moving the variables to global scope allows reuse, but the reuse management must be managed by the programmer. Probably the best balance is to make simple variables static but to make large chunks of memory like string buffers global and carefully reuse them.
Be careful with initialisation.
foo()
{
int myvar = 5;
}
must change to
foo()
{
static int myvar;
myvar = 5;
}
Parameters
If you go around passing large lots of data down the call tree in parameters you will quickly run into the same 256 byte limitation. Your best option here may be to pass a pointer to a globally allocated struct/s of "options".Alternatively you can have global settings variables that are set by the top caller and read by callees down the tree. It really depends on the design of the software which approach is better.
I've struggled with the same issues as the OP and I think the best option in the long run is to move away from using the Hitech compiler. The optimisation decision the compiler writers took to allocate all locals/params in one block is only really appropriate for the very small ram size PICS. For large PICS you will run out of local/param far before you hit the ram size of the device. Then you have to start hacking your code around to fit the compiler which is perverse.
In summary... Yes your approach is valid. But do consider simply making locals static if that is appropriate as, in general, reducing the scope makes your code safer.
Whereas the C18 compiler used some FSRs (pointers) to manage the data stack, it sounds like the new XC8 compiler from Microchip uses a compiled stack, so you should know exactly how much space is taken up by the stack at compile time. You will also know exactly where each stack variable is stored. I read all about this in the XC8 user's guide and it sounds great. That feature should make this question be moot, assuming you are using XC8.
My experience with compilers/linkers for chips with limited memory is that, as long as you don't use recursive functions and inform the compiler about that, then the compiler is very capable of determining the minimal amount of stack-space that is needed.
I have even seen compilers that give each variable with automatic storage a globally fixed address (no stack at all), where several variables got allocated to overlapping memory, as long as their lifetimes did not overlap.
The general advise when doing (speed or space) optimisations is: make measurements to prove that your optimisation actually has a positive effect.
Since you are nearly out of memory, you have to count each byte of RAM. Using local variables (auto) allows to reuse the memory where you need it (local in the function). When you move the variables to global static address space, you give each variable a unique space. That's wast of address space.
The Microchip compiler allows that different variables share the same address. I don't have the docs at hand, but this can be done by pragma.
But what you need is a analysis of RAM requirements. When you see, that the stack cannot hold all variables but the auto variables would reduce the global memory use, you should consider to increase the stack size using startup code and the linker script.
Best practive is to choose a hardware that fits the requirements.
There are microcontrollers around the cost only some dollars more, but save hundereds or thousand of dollars development costs. If this is a hobby development your effort may not count. But in real world you can often find hardware that is designed only with view of hardware costs.
Especially the PIC18 is not the best example for compact code, what also can be a problem with the flash memory.
This migth sound obvious, but try not to use 16 bits variables on 8 bit precessors. 16 bits variables are fine and needed on bigger arquitectures, but in limited (8 bit) architectures a 16 bit aritmetic is a quick way for depleting both RAM and ROM memories in no time.
If you try to increment a 16 bits variable, the compiler would include a 16 bits increment library, that consumes in most cases a lot of space.
Also, try not to divide or multiply, as for some controllers they are software implemented.
Personally, I go alwais for char and when in need of a divide operation, use rotate rigth 'n' times to divide by 2 n times.
hope this helps!
A bit late, but you should also have a closer look at the C18 compiler user guide (if you were using this compiler).
You could decrease the stack dramatically by statically allocating local variables (overriding the auto keyword). Even better, you can use the overlay storage identifier, which allows different non-overlapping lifetimes variables to be placed at the same address, minimizing RAM. (C18 compiler must operate in Non-Extended mode).

In C, does using static variables in a function make it faster?

My function will be called thousands of times. If i want to make it faster, will changing the local function variables to static be of any use? My logic behind this is that, because static variables are persistent between function calls, they are allocated only the first time, and thus, every subsequent call will not allocate memory for them and will become faster, because the memory allocation step is not done.
Also, if the above is true, then would using global variables instead of parameters be faster to pass information to the function every time it is called? i think space for parameters is also allocated on every function call, to allow for recursion (that's why recursion uses up more memory), but since my function is not recursive, and if my reasoning is correct, then taking off parameters will in theory make it faster.
I know these things I want to do are horrible programming habits, but please, tell me if it is wise. I am going to try it anyway but please give me your opinion.
The overhead of local variables is zero. Each time you call a function, you are already setting up the stack for the parameters, return values, etc. Adding local variables means that you're adding a slightly bigger number to the stack pointer (a number which is computed at compile time).
Also, local variables are probably faster due to cache locality.
If you are only calling your function "thousands" of times (not millions or billions), then you should be looking at your algorithm for optimization opportunities after you have run a profiler.
Re: cache locality (read more here):
Frequently accessed global variables probably have temporal locality. They also may be copied to a register during function execution, but will be written back into memory (cache) after a function returns (otherwise they wouldn't be accessible to anything else; registers don't have addresses).
Local variables will generally have both temporal and spatial locality (they get that by virtue of being created on the stack). Additionally, they may be "allocated" directly to registers and never be written to memory.
The best way to find out is to actually run a profiler. This can be as simple as executing several timed tests using both methods and then averaging out the results and comparing, or you may consider a full-blown profiling tool which attaches itself to a process and graphs out memory use over time and execution speed.
Do not perform random micro code-tuning because you have a gut feeling it will be faster. Compilers all have slightly different implementations of things and what is true on one compiler on one environment may be false on another configuration.
To tackle that comment about fewer parameters: the process of "inlining" functions essentially removes the overhead related to calling a function. Chances are a small function will be automatically in-lined by the compiler, but you can suggest a function be inlined as well.
In a different language, C++, the new standard coming out supports perfect forwarding, and perfect move semantics with rvalue references which removes the need for temporaries in certain cases which can reduce the cost of calling a function.
I suspect you're prematurely optimizing, however, you should not be this concerned with performance until you've discovered your real bottlenecks.
Absolutly not! The only "performance" difference is when variables are initialised
int anint = 42;
vs
static int anint = 42;
In the first case the integer will be set to 42 every time the function is called in the second case ot will be set to 42 when the program is loaded.
However the difference is so trivial as to be barely noticable. Its a common misconception that storage has to be allocated for "automatic" variables on every call. This is not so C uses the already allocated space in the stack for these variables.
Static variables may actually slow you down as its some aggresive optimisations are not possible on static variables. Also as locals are in a contiguous area of the stack they are easier to cache efficiently.
There is no one answer to this. It will vary with the CPU, the compiler, the compiler flags, the number of local variables you have, what the CPU's been doing before you call the function, and quite possibly the phase of the moon.
Consider two extremes; if you have only one or a few local variables, it/they might easily be stored in registers rather than be allocated memory locations at all. If register "pressure" is sufficiently low that this may happen without executing any instructions at all.
At the opposite extreme there are a few machines (e.g., IBM mainframes) that don't have stacks at all. In this case, what we'd normally think of as stack frames are actually allocated as a linked list on the heap. As you'd probably guess, this can be quite slow.
When it comes to accessing the variables, the situation's somewhat similar -- access to a machine register is pretty well guaranteed to be faster than anything allocated in memory can possible hope for. OTOH, it's possible for access to variables on the stack to be pretty slow -- it normally requires something like an indexed indirect access, which (especially with older CPUs) tends to be fairly slow. OTOH, access to a global (which a static is, even though its name isn't globally visible) typically requires forming an absolute address, which some CPUs penalize to some degree as well.
Bottom line: even the advice to profile your code may be misplaced -- the difference may easily be so tiny that even a profiler won't detect it dependably, and the only way to be sure is to examine the assembly language that's produced (and spend a few years learning assembly language well enough to know say anything when you do look at it). The other side of this is that when you're dealing with a difference you can't even measure dependably, the chances that it'll have a material effect on the speed of real code is so remote that it's probably not worth the trouble.
It looks like the static vs non-static has been completely covered but on the topic of global variables. Often these will slow down a programs execution rather than speed it up.
The reason is that tightly scoped variables make it easy for the compiler to heavily optimise, if the compiler has to look all over your application for instances of where a global might be used then its optimising won't be as good.
This is compounded when you introduce pointers, say you have the following code:
int myFunction()
{
SomeStruct *A, *B;
FillOutSomeStruct(B);
memcpy(A, B, sizeof(A);
return A.result;
}
the compiler knows that the pointer A and B can never overlap and so it can optimise the copy. If A and B are global then they could possibly point to overlapping or identical memory, this means the compiler must 'play it safe' which is slower. The problem is generally called 'pointer aliasing' and can occur in lots of situations not just memory copies.
http://en.wikipedia.org/wiki/Pointer_alias
Using static variables may make a function a tiny bit faster. However, this will cause problems if you ever want to make your program multi-threaded. Since static variables are shared between function invocations, invoking the function simultaneously in different threads will result in undefined behaviour. Multi-threading is the type of thing you may want to do in the future to really speed up your code.
Most of the things you mentioned are referred to as micro-optimizations. Generally, worrying about these kind of things is a bad idea. It makes your code harder to read, and harder to maintain. It's also highly likely to introduce bugs. You'll likely get more bang for your buck doing optimizations at a higher level.
As M2tM suggests, running a profiler is also a good idea. Check out gprof for one which is quite easy to use.
You can always time your application to truly determine what is fastest. Here is what I understand: (all of this depends on the architecture of your processor, btw)
C functions create a stack frame, which is where passed parameters are put, and local variables are put, as well as the return pointer back to where the caller called the function. There is no memory management allocation here. It usually a simple pointer movement and thats it. Accessing data off the stack is also pretty quick. Penalties usually come into play when you're dealing with pointers.
As for global or static variables, they're the same...from the standpoint that they're going to be allocated in the same region of memory. Accessing these may use a different method of access than local variables, depends on the compiler.
The major difference between your scenarios is memory footprint, not so much speed.
Using static variables can actually make your code significantly slower. Static variables must exist in a 'data' region of memory. In order to use that variable, the function must execute a load instruction to read from main memory, or a store instruction to write to it. If that region is not in the cache, you lose many cycles. A local variable that lives on the stack will most surely have an address that is in the cache, and might even be in a cpu register, never appearing in memory at all.
I agree with the others comments about profiling to find out stuff like that, but generally speaking, function static variables should be slower. If you want them, what you are really after is a global. Function statics insert code/data to check if the thing has been initialized already that gets run every time your function is called.
Profiling may not see the difference, disassembling and knowing what to look for might.
I suspect you are only going to get a variation as much as a few clock cycles per loop (on average depending on the compiler, etc). Sometimes the change will be dramatic improvement or dramatically slower, and that wont necessarily be because the variables home has moved to/from the stack. Lets say you save four clock cycles per function call for 10000 calls on a 2ghz processor. Very rough calculation: 20 microseconds saved. Is 20 microseconds a lot or a little compared to your current execution time?
You will likely get more a performance improvement by making all of your char and short variables into ints, among other things. Micro-optimization is a good thing to know but takes lots of time experimenting, disassembling, timing the execution of your code, understanding that fewer instructions does not necessarily mean faster for example.
Take your specific program, disassemble both the function in question and the code that calls it. With and without the static. If you gain only one or two instructions and this is the only optimization you are going to do, it is probably not worth it. You may not be able to see the difference while profiling. Changes in where the cache lines hit could show up in profiling before changes in the code for example.

Resources