Optimization for global and static variables - c

I read some topics over optimization and it is mentioned that global variables can not be stored in registers and hence if we need to optimize we use register variable to store the global data and modify that register variable. Is this applies to static variables too?
For auto storage, what if we store auto variables in register variables? Won't it faster the access from register instead of stack?

Both global variables and static variables exist in the data segment, which includes the data, BSS, and heap sections. If the static variable is initialized to 0 or not initialized to anything, it goes in the BSS section. If it is given a non-zero initialization value, then it is in the "data" section. See:
http://en.wikipedia.org/wiki/Data_segment
As for auto vs. register variables: register does not guarantee that the variable will be stored in a register, it is more providing a hint from the programmer. See:
http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/CONCEPT/storage_class.html
Yes, it is (much) faster to access a register than to access the stack memory, but this optimization nowadays is left up to the compiler (the problem of register allocation) as well as the CPU architecture (which has a great many optimizations too complex to explain here).
Unless you're programming for a really simple or old architecture and/or using a really outdated compiler, you probably should not worry about this kind of optimization.

global variables' values can be held in registers for so long as the compiler can prove there is no other access to the stored value. With values that can't be held in a register themselves, declaring a pointer with the restrict keyword declares that a value isn't being accessed via any other means for that pointer's lifetime; just don't give away any copies and the compiler will take care of the rest. For scalars declaring thistype localval=globalval; works at least as well if you're not changing the value or you've got good control over scope exits -- or even better.
You can only use the restrict declaration if the value really won't be accessed otherwise. Optimizers these days can for instance deduce from your declaring the object won't be accessed in one function that a code path that does access it in another won't be executed, and from that deduce the content of the expression used to take that code path, and so on. "If you lie to the compiler, it will have its revenge" is more true today than ever.

Related

Where does local const variable will get stored?

Where does local const variable will get stored? I have verified that, every where in function where const variable is used, get replaced with its value(like immediate value addressing mode). But if pointer is assigned to it then it gets stored on stack. Here I do not understand one thing how processor knows its constant value. Is there any read only section in stack like it present in .data section?
Generally, the processor does not know that an object is declared const in C.
Systems commonly have regions of memory that are marked read-only after a program is loaded, and static const objects are stored in such memory. For these objects, the processor enforces the read-only property.
Systems generally do not have read-only memory used for stack. This would be inherently difficult—the memory would need to be read-write when a function is starting, so that its stack frame can be constructed, but read-only at other times. So the program would be frequently changing the hardware memory protection settings. This would impair performance and is generally not considered worth while.
So programs generally have only a read-write stack available. When you declare an automatic (rather than static) const object, where can the compiler put it? As you note, it is often optimized into an immediate operand in instructions. However, when you take its address, it must have an address, so it must be in memory.
One idea might be that, since it is const, it will not chamge, so we only need one copy, so it can be stored in the static read-only section instead of on the stack. However, the C standard says that each different object has a different address. To comply with that requirement, the compiler has to create a different instance of the object in memory each time it is created in the C code. Putting it on the stack is an easy way to do this.
I think it totally depends on your tool-chain specific implementation. Variables are stored in RAM, program in Flash memory and constants either in RAM or Flash.
Correct me if I'm wrong.

How can there be so many register variables, with such a limited number of registers?

I was fooling around with C and I realized, by rights, if I declared a bunch of register variables, wouldn't the values be overwritten? From what I can tell from assembly, there aren't a ton of registers in the microprocessor, not enough to satisfy the demand I created. How does C keep all the values?
There's no requirement that all variables declared with register must be kept in CPU registers.
Here's what the C standard says:
A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast as
possible. The extent to which such suggestions are effective is
implementation-defined.
Reference: ISO C11 N1570 draft, 6.7.1 paragraph 6. Note that it doesn't even mention CPU registers.
A conforming compiler could simply ignore all register keywords (aside from imposing some restrictions on taking the address of register objects).
In practice, most compilers will simply place as many register variables in CPU registers as they can.
And in fact a modern optimizing compiler is likely to be better at register allocation than most programmers are -- especially since they can recompute register mappings every time you recompile your program after modifying it.
The common wisdom these days is that the register keyword doesn't provide much benefit.
Old compiler would allocate as many registers to register variables as they could (in some cases, this number was 0) and allocate the remaining variables on the stack.
Modern compilers generally ignore the register keyword. They employ sophisticated register allocators that automatically keep as many variables in registers as possible.
The only effect of register you can rely on is that you get a diagnostic message if you try to take the address of a register variable. Otherwise, register variables behave just like automatic variables do.
register is a hint that the compiler can keep the variable in registers. You can't force the compiler to use more registers than exist on the target architecture, for the obvious reason that it's impossible.
In C, the register keyword simply means that the variable's address can't be taken. This stops you from doing anything that would prevent the compiler keeping it in a register, but doesn't require it to be in a register.
From https://en.cppreference.com/w/c/language/storage_duration
The register specifier is only allowed for objects declared at block scope, including function parameter lists. It indicates automatic storage duration and no linkage (which is the default for these kinds of declarations), but additionally hints the optimizer to store the value of this variable in a CPU register if possible. Regardless of whether this optimization takes place or not, variables declared register cannot be used as arguments to the address-of operator, cannot use alignas (since C11), and register arrays are not convertible to pointers.
It hasn't really done anything for years: Optimizing compilers already keep vars in regs whenever possible. For vars that are global or have had their address taken, then maybe only for part of a function, storing the result back to memory if the variable can't be optimized away.
BTW, register was officially deprecated in C++, and C++17 actually removed it from the language. https://en.cppreference.com/w/cpp/language/storage_duration.
Related: GNU C has register int foo asm("eax"); (or whatever other register), but even that is only guaranteed to have an effect when used as an operand to an inline-asm statement when used for local variables. In current GCC versions, it does cause the compiler to use that register for the variable, unless it needs to spill / reload it to stack memory across function calls or whatever.
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
But in GNU C, you can use global register variables, where a register is dedicated to a global for the entire life of your program, hurting optimization of code that isn't using that variable. It's an interesting option but not one you should use.
C was designed to allow a compiler to generate assembly code for a function while it was being parsed, rather than having to read an entire function, examine it, and then produce code afterward. A compiler that has parsed a program as far as:
int test(void)
{
int x=0,y=0;
int *p = &y;
while(x < 10)
{
x++;
foo();
x++;
*p *= 3;
x++;
bar();
...
would have no way of knowing whether the value of x could be safely kept in a register across the call to foo and/or the operation on *p or whether it might be possible for foo to alter the value of x.
The purpose of the register keyword was effectively to tell the compiler that it would be safe keep the value of an object in a register across function calls or operations that write to pointers, even if it hasn't seen everything that code might do with the object. Such a meaning could be useful even today if passing the object's address to a nested function weren't a constraint violation, but a compiler was allowed to assume that in any context where a named-object lvalue was used, all operations would involve that named-object lvalue. If an object's address is never taken, no qualifier would be needed to invite such an assumption, but in cases where an object's address is taken but not persisted across conflicting operations involving the object, such a qualifier could give a compiler information it would otherwise not have.
Variables are normally stored on the stack. That is, a block of memory. The value of a variable is normally loaded into a register for manipulation and moved back to the stack (saved) if another variable is to be manipulated. Often the variable isn't even loaded into a register, it is manipulated on the stack.

Why C variables stored in specific memory locations?

Yesterday I had an interview where the interviewer asked me about the storage classes where variables are stored.
My answer war:
Local Variables are stored in Stack.
Register variables are stored in Register
Global & static variables are stored in data segment.
The memory created dynamically are stored in Heap.
The next question he asked me was: why are they getting stored in those specific memory area? Why is the Local variable not getting stored in register (though I need an auto variable getting used very frequently in my program)? Or why global or static variables are not getting stored in stack?
Then I was clueless. Please help me.
Because the storage area determines the scope and the lifetime of the variables.
You choose a storage specification depending on your requirement, i.e:
Lifetime: The duration you expect the particular variable needs to be alive and valid.
Scope: The scope(areas) where you expect the variable to be accessible.
In short, each storage area provides a different functionality and you need various functionality hence different storage areas.
The C language does not define where any variables are stored, actually. It does, however, define three storage classes: static, automatic, and dynamic.
Static variables are created during program initialization (prior to main()) and remain in existence until program termination. File-scope ('global') and static variables fall under the category. While these commonly are stored in the data segment, the C standard does not require this to be the case, and in some cases (eg, C interpreters) they may be stored in other locations, such as the heap.
Automatic variables are local variables declared in a function body. They are created when or before program flow reaches their declaration, and destroyed when they go out of scope; new instances of these variables are created for recursive function invocations. A stack is a convenient way to implement these variables, but again, it is not required. You could implement automatics in the heap as well, if you chose, and they're commonly placed in registers as well. In many cases, an automatic variable will move between the stack and heap during its lifetime.
Note that the register annotation for automatic variables is a hint - the compiler is not obligated to do anything with it, and indeed many modern compilers ignore it completely.
Finally, dynamic objects (there is no such thing as a dynamic variable in C) refer to values created explicitly using malloc, calloc or other similar allocation functions. They come into existence when explicitly created, and are destroyed when explicitly freed. A heap is a convenient place to put these - or rather, one defines a heap based on the ability to do this style of allocation. But again, the compiler implementation is free to do whatever it wants. If the compiler can perform static analysis to determine the lifetime of a dynamic object, it might be able to move it to the data segment or stack (however, few C compilers do this sort of 'escape analysis').
The key takeaway here is that the C language standard only defines how long a given value is in existence for. And a minimum bound for this lifetime at that - it may remain longer than is required. Exactly how to place this in memory is a subject in which the language and library implementation is given significant freedom.
It is actually just an implementation detail that is convenient.
The compiler could, if he wanted to, generate local variables on the heap if he wishes.
It is just easier to create them on the stack since when leaving a function you can adjust the frame pointer with a simple add/subtract depending on the growth direction of the stack and so automatically free the used space for the next function. Creating locals on the heap however would mean more house-keeping work.
Another point is local variables must not be created on the stack, they can be stored and used just in a register if the compiler thinks that's more appropriate and has enough registers to do so.
Local variables are stored in registers in most cases, because registers are pushed and poped from stack when you make function calls It looks like they are on stack.
There is actually no such tings as register variables because it is just some rarely used keyword in C that tells compiler to try to put this in registers. I think that most compilers just ignore this keyword.
That why asked you more, because he was not sure if you deeply understand topic. Fact is that register variables are virtually on stack.
in embedded systems we have different types of memories(read only non volatile(ROM), read write non volatile(EEPROM, PROM, SRAM, NVRAM, flash), volatile(RAM)) to use and also we have different requirements(cannot change and also persist after power cycling, can change and also persist after power cycling, can change any time) on data we have. we have different sections because we have to map our requirements of data to different types of available memories optimistically.

C performance on a PIC board global variables vs. method local

All,
I have C functions that are called many times a second as they are part of a control loop on a PIC18 board. These functions have variables that only need method scope, but I was wondering what if any overhead there was to constantly allocating these variables vs. using a global or at least higher scoped variable. (Thought of typedef'ing a struct to pass around from a higher scope to avoid global variable use if performance dictates not using method local varables)
There are some good threads on here that cover this topic, but I have yet to see a definitive answer as most preach best practices which I agree and would follow as long as there are not performance gains to be had as every microsecond counts.
One thread mentioned using file scoped static variables as a substitute for global variables, but I can't help wonder if even that is necessary.
What does everyone think?
Accessing a local variable requires doing something like *(SP + offset) (where SP is the stack-pointer), whereas accessing a static (which includes globals) requires something like *(address).
From what I recall, the PIC instruction set has very limited addressing modes. So it's very likely that accessing the global will be faster, at least for the first time it's accessed. Subsequent accesses may be identical if the compiler holds the computed address in a register.
As #unwind said in the comments, you should take a look at the compiler output, and profile to confirm. I would only sacrifice clarity/maintainability if you've proved that it's worthwhile in terms of the runtime of your program.
While I've not used every single PIC compiler in existence, there are two styles. The style I've used allocates all local variables statically by analyzing the program's call graph. If every possible call were in fact performed, the amount of stack memory consumed by locals would match what would be required by static allocation, with a couple of caveats (describing the behavior of HiTech's PICC-18 "standard" compiler--others may vary)
Variadic functions are handled by defining local-variable storage in the scope of the caller, and passing a two-byte pointer to that storage to the function being called.
For every different signature of indirect function pointer, the compiler generates a "pseudo-function" in the call graph; everything that calls a function of that signature calls the pseudo-function, and that pseudo-function calls every function with that signature that has its address taken.
In this style of compiler, consecutive accesses to local variables will be just as fast as consecutive accesses to globals. Other than global and static variables explicitly-declared as "near", however, which must total no more than 64-128 bytes (varies with different models of PIC), the global and static variables for each module are located separately from local variables, and bank-switching instructions are needed to access things in different banks.
Some compilers which I have not used employ the "enhanced instruction set" option. This option gobbles up 96 bytes of the "near" bank (or all of it, on PICs with less than 96 bytes) and uses it to access 96 bytes relative to the FSR2 register. This would be a wonderful concept if it used the first 16, or maybe 32, bytes as a stack frame. Using 96 bytes means giving up all of the "near" storage, which is a pretty severe limitation. Nonetheless, compilers which use this instruction set can access local variables on a stack just as fast, if not faster, than global variables (no bank-switch required). I really wish Microchip had an option to only set aside 16 bytes or so for the stack frame, leaving a useful amount of 'common bank' RAM, but nonetheless some people have good luck with that mode.
I would imagine that this depends a lot on which compiler you are using. I don't know PIC but I'm guessing some (all?) PIC compilers will optimize the code so that local variables are stored in CPU registers whenever possible. If so, then local variables will likely be equally fast as globals.
Otherwise if the local variable is allocated on the stack the global may be a bit faster to access (see Oli's answer).

C variable allocation time and space

If i have a test.c file with the following
#include ...
int global = 0;
int main() {
int local1 = 0;
while(1) {
int local2 = 0;
// Do some operation with one of them
}
return 0;
}
So if I had to use one of this variables in the while loop, which one would be preferred?
Maybe I'm being a little vague here, but I want to know if the difference in time/space allocation is actually relevant.
If you are wondering whether declaring a variable inside a for loop causes it to be created/destroyed at every iteration, there is nothing really to worry about. These variables are not dynamically allocated at runtime, nothing is being malloced here - just some memory is being set aside for use inside the loop. So having the variable inside is just the same as having it outside the loop in terms of performance.
The real difference here is scope not performance. Whether you use a global or local variable only affects where you want this variable to be visible.
In case you're wondering about performance differences: most likely there aren't any. If there are theoretical performance differences, you'll find it hard to actually devise a test to measure them.
A decision like this should not be based on performance but semantics. Unless the semantic behavior of a global variable is required, you should always use automatic (local non-static) variables.
As others have said and surely will say, there are unlikely to be any differences in performance. If there are, the automatic variable will be faster.
The C compiler will have an easier time making optimizations on the variables declared local to the function. The global variable would require an optimizer to perform "Inter-Procedural Data Flow Analysis", which isn't that commonly done.
As an example of the difference, consider that all your declarations initialize the variable to zero. However, in the case of the global variable, the compiler cannot use that information unless it verifies that no flow of control in your program can change the global prior to using it in your example function. In the case of the locally declared ("automatic") variables, there is no way the initial value can be changed by another function (in particular, the compiler verifies that their address is never passed to a sub-function) and the compiler can perform "killed definitions" and "value liveness" analysis to determine whether the zero value can be assumed in some code paths.
Of the two local variables, as a guideline, the optimizer will always have an easier time optimizing access to the variable with the smaller (more limited) scope.
Having stated the above, I would suggest that other answers concerning a bias toward semantics over optimizer-meta-optimization is correct. Use the variable which causes the code to read best, and you will be rewarded with more time returned to you than assisting the def-use optimization calculation.
In general, avoid using a global variable, or any variable which can be accessed more broadly than absolutely necessary. Limited scoping of variables helps prevent bugs from being introduced during later program maintenance.
There are three broad classes of variables: static (global), stack (auto), and register.
Register variables are stored in CPU registers. Registers are very fast word-sized memories, which are integrated in the CPU pipeline. They are free to access, but there are a very limited number of them (typically between 8 and 32 depending on your processor and what operations you're doing).
Stack variables are stored in an area of RAM called the stack. The stack is almost always going to be in the cache, so stack variables typically take 1-4 cycles to access.
Generally, local variables can be either in registers or on the stack. It doesn't matter whether they are allocated at the top of a function or in a loop; they will only be allocated once per function call, and allocation is basically free. The compiler will put variables in registers if at all possible, but if you have more active variables than registers, they won't all fit. Also, if you take the address of a variable, it must be stored on the stack since registers don't have addresses.
Global and static variables are a different beast. Since they are not usually accessed frequently, they may not be in cache, so it could take hundreds of cycles to access them. Also, since the compiler may not know the address of a global variable ahead of time, it may need to be looked up, which is also expensive.
As others have said, don't worry too much about this stuff. It's definitely good to know, but it shouldn't affect the way you write your programs. Write code that makes sense, and let the compiler worry about optimization. If you get into compiler development, then you can start worrying about it. :)
Edit: more details on allocation:
Register variables are allocated by the compiler, so there is no runtime cost. The code will just put a value in a register as soon as the value is produced.
Stack variables are allocated by your program at runtime. Typically, when a function is called, the first thing it will do is reserve enough stack space for all of its local variables. So there is no per-variable cost.

Resources