About "static" in C, how is it implemented by the compiler? - c

About "static" in C, how is it implemented by the compiler ?
This is a Bloomberg interview question. Any thoughts ?

[[I'm assuming we're talking about static in the context of variables here, because static functions are simply a compile/link-time thing, with no run-time implications.]]
In short, it's implementation-specific. The compiler is free to do anything it chooses.
Typically (but by no means exclusively), statics are stored in the .bss or .data sections of the executable image at fixed locations. This has performance advantages, as they can be accessed with literal addresses, rather than pointer dereferences (as would be the case for stack-based variables). As this is part of the binary, this also means that the initial values are automatically mapped into memory when the executable is first loaded; no intialisation routines are required.

static global variables will generally be allocated a fixed address at compile time. Many OS provide memory regions that are 0-initialised, others that map part of the executable image with preinitialised data, as well as totally uninitialised areas. Depending on whether the compiler can work out the correct initial content of the static variable at compile time, it may select the most appropriate of these memory regions, calling any run-time initialisation later if required. For example:
static int x; // needs to be 0 before main() runs
// best way: 0-initialised memory area
static int y = 3; // best way: map/copy area of executable already containing "3"
static int z = time(NULL); // initial value unimportant
// best way: uninitialised memory area
// pre-main() init code
Notes: putting e.g. z in 0-initialised memory then clobbering it is not significantly wasteful - just not strictly necessary. Some OS may have separate areas for read-only/const values.
static local variables with compile-time known initial values may be created as per globals. For run-time initialised values (only legal in C++) compilers tend to (must?) initialise them when the scope is first entered. This is typically coordinated by having an implicit supporting boolean value per scope containing static local variables: each time the scope is entered the boolean is consulted to see whether the statics need to be initialised. On some compilers, this may actually be done in a less efficient but thread-safe fashion, such that static locals can be used for singleton instances.
EDIT: Given Lundin's comment (correctly) asserting all C statics must be initialised before main(), I wrote some code to explore this:
#include <stdio.h>
#include <time.h>
void f()
{
static int i = time(NULL);
printf("%d\n", i);
}
int main()
{
int i = time(NULL);
printf("%d\n", i);
sleep(2);
f();
}
With GCC's C compiler, I get a fatal compilation error about the local static i requiring initialisation at run-time. Compiled as C++ (my main language), this is perfectly legal and initialised at run-time and after entering main() - showing that that part of my explanation above is only relevant to C++.
static functions are simply marked in the generated object such that the linker won't consider them when matching unresolved calls from other objects.

Related

Memory allocation of functions and variables in C

Depended of the version of the C compiler and compiler flags it is possible to initialize variables on any place in your functions (As far as I am aware).
I'm used to put it all the variables at the top of the function, but the discussion started about the memory use of the variables if defined in any other place in the function.
Below I have written 2 short examples, and I wondered if anyone could explain me (or verify) how the memory gets allocated.
Example 1: Variable y is defined after a possible return statement, there is a possibility this variable won't be used for that reason, as far as I'm aware this doesn't matter and the code (memory allocation) would be the same if the variable was placed at the top of the function. Is this correct?
Example 2: Variable x is initialized in a loop, meaning that the scope of this variable is only within this loop, but what about the memory use of this variable? Would it be any different if placed on the top of the functions? Or just initialized on the stack at the function call?
Edit: To conclude a main question:
Does reducing the scope of the variable or change the location of the first use (so anywhere else instead of top) have any effects on the memory use?
Code example 1
static void Function(void){
uint8_t x = 0;
//code changing x
if(x == 2)
{
return;
}
uint8_t y = 0;
//more code changing y
}
Code example 2
static void LoopFunction(void){
uint8_t i = 0;
for(i =0; i < 100; i ++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%d", x);
}
//more code
}
I'm used to put it all the variables at the top of the function
This used to be required in the older versions of C, but modern compilers dropped that requirement. As long as they know the type of the variable at the point of its first use, the compilers have all the information they need.
I wondered if anyone could explain me how the memory gets allocated.
The compiler decides how to allocate memory in the automatic storage area. Implementations are not limited to the approach that gives each variable you declare a separate location. They are allowed to reuse locations of variables that go out of scope, and also of variables no longer used after a certain point.
In your first example, variable y is allowed to use the space formerly occupied by variable x, because the first point of use of y is after the last point of use of x.
In your second example the space used for x inside the loop can be reused for other variables that you may declare in the // more code area.
Basically, the story goes like this. When calling a function in raw assembler, it is custom to store everything used by the function on the stack upon entering the function, and clean it up upon leaving. Certain CPUs and ABIs may have a calling convention which involves automatic stacking of parameters.
Likely because of this, C and many other old languages had the requirement that all variables must be declared at the top of the function (or on top of the scope), so that the { } reflect push/pop on the stack.
Somewhere around the 80s/90s, compilers started to optimize such code efficiently, as in they would only allocate room for a local variable at the point where it was first used, and de-allocate it when there was no further use for it. Regardless of where that variable was declared - it didn't matter for the optimizing compiler.
Around the same time, C++ lifted the variable declaration restrictions that C had, and allowed variables to be declared anywhere. However, C did not actually fix this before the year 1999 with the updated C99 standard. In modern C you can declare variables everywhere.
So there is absolutely no performance difference between your two examples, unless you are using an incredibly ancient compiler. It is however considered good programming practice to narrow the scope of a variable as much as possible - though it shouldn't be done at the expense of readability.
Although it is only a matter of style, I would personally prefer to write your function like this:
(note that you are using the wrong printf format specifier for uint8_t)
#include <inttypes.h>
static void LoopFunction (void)
{
for(uint8_t i=0; i < 100; i++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%" PRIu8, x);
}
//more code
}
Old C allowed only to declare (and initialize) variables at the top of a block. You where allowed to init a new block (a pair of { and } characters) anywhere inside a block, so you had then the possibility of declaring variables next to the code using them:
... /* inside a block */
{ int x = 3;
/* use x */
} /* x is not adressabel past this point */
And you where permitted to do this in switch statements, if statements and while and do statements (everywhere where you can init a new block)
Now, you are permitted to declare a variable anywhere where a statement is allowed, and the scope of that variable goes from the point of declaration to the end of the inner nested block you have declared it into.
Compilers decide when they allocate storage for local variables so, you can all of them allocated when you create a stack frame (this is the gcc way, as it allocates local variables only once) or when you enter in the block of definition (for example, Microsoft C does this way) Allocating space at runtime is something that requires advancing the stack pointer at runtime, so if you do this only once per stack frame you are saving cpu cycles (but wasting memory locations). The important thing here is that you are not allowed to refer to a variable location outside of its scoping definition, so if you try to do, you'll get undefined behaviour. I discovered an old bug for a long time running over internet, because nobody take the time to compile that program using Microsoft-C compiler (which failed in a core dump) instead of the commmon use of compiling it with GCC. The code was using a local variable defined in an inner scope (the then part of an if statement) by reference in some other part of the code (as everything was on main function, the stack frame was present all the time) Microsoft-C just reallocated the space on exiting the if statement, but GCC waited to do it until main finished. The bug solved by just adding a static modifier to the variable declaration (making it global) and no more refactoring was neccesary.
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
struct bla_bla x;
...
pointer_to_x = &x;
}
/* x does not exist (but it did in gcc) */
do_something_to_bla_bla(pointer_to_x); /* wrong, x doesn't exist */
} /* main */
when changed to:
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
static struct bla_bla x; /* now global ---even if scoped */
...
pointer_to_x = &x;
}
/* x is not visible, but exists, so pointer_to_x continues to be valid */
do_something_to_bla_bla(pointer_to_x); /* correct now */
} /* main */

How to measure a functions stack usage in C?

Is there a way I can measure how much stack memory a function uses?
This question isn't specific to recursive functions; however I was interested to know how much stack memory a function called recursively would take.
I was interested to optimize the function for stack memory usage; however, without knowing what optimizations the compiler is already making, it's just guess-work if this is making real improvements or not.
To be clear, this is not a question about how to optimize for better stack usage
So is there some reliable way to find out how much stack memory a function uses in C?
Note: Assuming it's not using alloca or variable-length arrays,
it should be possible to find this at compile time.
Using warnings
This is GCC specific (tested with gcc 4.9):
Add this above the function:
#pragma GCC diagnostic error "-Wframe-larger-than="
Which reports errors such as:
error: the frame size of 272 bytes is larger than 1 bytes [-Werror=frame-larger-than=]
While a slightly odd way method, you can at least do this quickly while editing the file.
Using CFLAGS
You can add -fstack-usage to your CFLAGS, which then writes out text files along side the object files.
See: https://gcc.gnu.org/onlinedocs/gnat_ugn/Static-Stack-Usage-Analysis.html
While this works very well, its may be a little inconvenient depending on your buildsystem/configuration - to build a single file with a different CFLAG, though this can of course be automated.
– (thanks to #nos's comment)
Note,
It seems most/all of the compiler natural methods rely on guessing - which isn't 100% sure to remain accurate after optimizations, so this at least gives a definitive answer using a free compiler.
You can very easily find out how much stack space is taken by a call to a function which has just one word of local variables in the following way:
static byte* p1;
static byte* p2;
void f1()
{
byte b;
p1 = &b;
f2();
}
void f2()
{
byte b;
p2 = &b;
}
void calculate()
{
f1();
int stack_space_used = (int)(p2 - p1);
}
(Note: the function declares a local variable which is only a byte, but the compiler will generally allocate an entire machine word for it on the stack.)
So, this will tell you how much stack space is taken by a function call. The more local variables you add to a function, the more stack space it will take. Variables defined in different scopes within the function usually don't complicate things, as the compiler will generally allocate a distinct area on the stack for every local variable without any attempt to optimize based on the fact that some of these variables might never coexist.
To calculate the stack usage for the current function you can do something like this:
void MyFunc( void );
void *pFnBottom = (void *)MyFunc;
void *pFnTop;
unsigned int uiStackUsage;
void MyFunc( void )
{
__asm__ ( mov pFnTop, esp );
uiStackUsage = (unsigned int)(pFnTop - pFnBottom);
}

Is there any difference between the size of memory allocated for the following types of declaration:

i) static int a, b, c;
ii) int a; int b; int c;
I am not sure as to how will the memory be allocated for these types of declaration. And if these declarations are different then how much memory is allocated for each declaration?
static int a,b,c;
will allocate three ints (probably 32bits each, or 4 bytes) in the DATA section of your program. They will always be there as long as your program runs.
int a; int b; int c;
will allocate three ints on the STACK. They will be gone when they go out of scope.
There is no difference between the size of memory for
static int a,b,c;
int a;int b;int c;
Differences occur in the lifetime, location, scope & initialization.
Lifetime: Were these were declare globally, both a,b,c sets would exist for the lifetime of the program. Were they both in a function, the static ones would exist for the program lifetime, but the other would exist only for the duration of the function. Further, should the function be called recursively or re-entrant, multiple sets of the non-static a,b,c, would exists.
Location: A common, thought not required by C, is to have a DATA section and STACK section of memory. Global variables tend to go in DATA as well as functional static ones. The non-static version of a,b,c in a function would typically go on in STACK.
Scope: Simple view: Functionally declared variables (static or not) are scoped within the function. Global variables declared static have file scope. Global variables not declared static have the scope of the entire program.
Initialization: follows along the same track as lifetime. Globally declared a,b,c, static or not, are both initialized at program start. If a,b,c are in a function, only static ones are initialized (at program start). Functional non-static a,b,c are not initialized.
Optimization may affect location, especially for the functional non-static a,b,c which could readily be saved in registers. Optimization may also determine that the variable is not used and optimizing it out, thus taking 0 bytes.
The variables that are defined as static will allocated in data segment at compile time. The same is true for global variables even though they are not static. Non-static variables defined within a block are allocated on the stack when the block is entered at runtime and are deallocted when th block is exited.
The amount of memory allocated is implementation dependent. The standard requires that an int is large enough to hold a 16-bit (2 byte) valued, but is can be larger. Most compilers you are likely to use now adays use 32-bit ints.
If we assume that 2nd declaration is inside function, than As Bard and Nashant already said these will be allocated in different memory sections (OS and compilers dependent).
But though variable size will be of the same size, they CAN consume different amount of memory. If function (from 2nd declaration) is called recursively for example, there will be multiple instances of variables from 2nd declaration.

Is there a good reason to initialize a static variable on every call to the function where it is defined?

A colleague is doing some code review, and he is seeing many static variable declarations similar to the following:
void someFunc(){
static int foo;
static int bar;
static int baz;
foo = 0;
bar = 0;
baz = 0;
/*
rest of the function code goes here
*/
}
Our question is,
Are the programmers who wrote this code simply unclear on the concept of a static variable,
or is there some clever reason to do this on purpose?
If it makes any difference, the environment is an embedded microcontroller and the compiler is GCC.
If it were not an embedded system, you would probably be correct: I would bet that the programmers were unclear on the concept of the static, and must have meant to write this:
static int foo = 0;
static int bar = 0;
static int baz = 0;
However, in an embedded system they could have used static to avoid allocating the variables in the automatic storage (i.e. on the stack). This could save a few CPU cycles, because the address of the static variable would be "baked into" the binary code of the compiled method.
In this context the static memory is allocated only once. The problem with this code is the initialization. If it's being reset at every execution, these variables should exist on the stack.
Implementing the function as it is, undermines the static benefits. The main 2 reasons for using static are:
Having a variable that maintains it's value between calls to the same function
Avoid allocating memory on the stack
#dasblinkenlight answer's pertains to the 2nd option, however there is nobody in embedded programming who would waste unrecoverable memory in order to save 24 bytes (assuming that int is 32 bytes on your architecture) on the stack. The reason is that the compiler is going to manipulate the stack pointer coming in to the function regardless, and therefore there is nothing to be save (in terms of cycles) from not having it push the SP another 24 bytes.
Keeping that in mind, we are left with the option that the user wanted to maintain some information regarding foo, bar and baz between calls. If this is also not the case, what you are looking at is bad programming.
static benefits. are:
Having a variable that maintains it's value between calls to the same function
Avoid allocating memory on the stack

Find size of a function in C

I am learning function pointers,I understand that we can point to functions using function pointers.Then I assume that they stay in memory.Do they stay in stack or heap?Can we calculate the size of them?
The space for code is statically allocated by the linker when you build the code. In the case where your code is loaded by an operating system, the OS loader requests that memory from the OS and the code is loaded into it. Similarly static data as its name suggests is allocated at this time, as is an initial stack (though further stacks may be created if additional threads are created).
With respect to determining the size of a function, this information is known to the linker, and in most tool-chains the linker can create a map file that includes the size and location of all static memory objects (i.e. those not instantiated at run-time on the stack or heap).
There is no guaranteed way of determining the size of a function at run-time (and little reason to do so) however if you assume that the linker located functions that are adjacent in the source code sequentially in memory, then the following may give an indication of the size of a function:
int first_function()
{
...
}
void second_function( int arg )
{
...
}
int main( void )
{
int first_function_length = (int)second_function - (int)first_function ;
int second_function_length = (int)main - (int)second_function ;
}
However YMMV; I tried this in VC++ and it only gave valid results in a "Release" build; the results for a "Debug" build made no real sense. I suggest that the exercise is for interest only and has no practical use.
Another way of observing the size of your code of course is to look at the disassembly of the code in your debugger for example.
Functions are part of text segment (which may or may not be 'heap') or its equivalent for the architecture you use. There's no data past compilation regarding their size, at most you can get their entry point from symbol table (which doesn't have to be available). So you can't calculate their size in practice on most C environments you'll encounter.
They're (normally) separate from either the stack or heap.
There are ways to find their size, but none of them is even close to portable. If you think you need/want to know the size, chances are pretty good that you're doing something you probably ought to avoid.
There's an interesting way to discover the size of the function.
#define RETN_empty 0xc3
#define RETN_var 0xc2
typedef unsigned char BYTE;
size_t FunctionSize(void* Func_addr) {
BYTE* Addr = (BYTE*)Func_addr;
size_t function_sz = 0;
size_t instructions_qt = 0;
while(*Addr != (BYTE)RETN_empty && *Addr != (BYTE)RETN_var) {
size_t inst_sz = InstructionLength((BYTE*)Addr);
function_sz += inst_sz;
Addr += inst_sz;
++instructions_qt;
}
return function_sz + 1;
}
But you need a function that returns the size of the instruction. You can find a function that finds the Instruction Length here: Get size of assembly instructions.
This function basically keeps checking the instructions of the function until it finds the instruction to return (RETN)[ 0xc3, 0xc2], and returns the size of the function.
To make it simple, functions usually don't go into the stack or the heap because they are meant to be read-only data, whereas stack and heap are read-write memories.
Do you really need to know its size at runtime? If no, you can get it by a simple objdump -t -i .text a.out where a.out is the name of your binary. The .text is where the linker puts the code, and the loader could choose to make this memory read-only (or even just execute-only). If yes, as it has been replied in previous posts, there are ways to do it, but it's tricky and non-portable... Clifford gave the most straightforward solution, but the linker rarely puts function in such a sequential manner into the final binary. Another solution is to define sections in your linker script with pragmas, and reserve a storage for a global variable which will be filled by the linker with the SIZEOF(...) section containing your function. It's linker dependent and not all linkers provide this function.
As has been said above, function sizes are generated by the compiler at compile time, and all sizes are known to the linker at link time. If you absolutely have to, you can make the linker kick out a map file containing the starting address, the size, and of course the name. You can then parse this at runtime in your code. But I don't think there's a portable, reliable way to calculate them at runtime without overstepping the bounds of C.
The linux kernel makes similar use of this for run-time profiling.
C has no garbage collector. Having a pointer to something doesn't make it stay in memory.
Functions are always in memory, whether or not you use them, whether or not you keep a pointer to them.
Dynamically allocated memory can be freed, but it has nothing to do with keeping a pointer to it. You shouldn't keep pointer to memory you have freed, and you should free it before losing the pointer to it, but the language doesn't do it automatically.
If there is anything like the size of the function it should be its STACK FRAME SIZE. Or better still please try to contemplate what exactly, according to you, should be the size of a function? Do you mean its static size, that is the size of all its opcode when it is loaded into memory?If that is what you mean, then I dont see their is any language provided feature to find that out.May be you look for some hack.There can be plenty.But I haven't tried that.
#include<stdio.h>
int main(){
void demo();
int demo2();
void (*fun)();
fun = demo;
fun();
printf("\n%lu", sizeof(demo));
printf("\n%lu", sizeof(*fun));
printf("\n%lu", sizeof(fun));
printf("\n%lu", sizeof(demo2));
return 0;
}
void demo(){
printf("tired");
}
int demo2(){
printf("int type funciton\n");
return 1;
}
hope you will get your answer, all function stored somewhere
Here the output of the code

Resources