Static variables in C not initialized to zero - c

I am working on a large C project for a company.
I have realized that some times in the compiled executable, static variables used in C files are not initialized to zero and have some value in them.
But when I edit the code a bit, like adding a print statement any where in the project, the issue is resolved.
I am using a Broadcom STB mips cross-compiler toolchain for compiling the codes.
The program is run on a Broadcom 97241 chipset running Linux 3.1.3.
[EDIT] I tried a clean build also but the problem did not go away.

The C standard requires that static variables must be initialized at the start of the program. If you don't initialize them, then the compiler will initialize them to 0. So if you are using normal compiler, then all your static variables are initialized to 0 if you don't initialize them explicitly. Such problems may occur if:
1) Some of your code set the value to a static variable.
2) The compiler is not C compiler.
3) Your program damage the memory and then you can't rely on assertions, on printf(), e.t.c.
Anyway. Try to initialize static variables to 0 explicitly. (to cut the 2 point off). And hope there is a way to debug your application. Debugger is much more useful in such problems, then asking such abstract question here.

As already has been stated, the static variables are set to 0 by the compiler. If you see some different behaviour it is most likley your code corrupting it somewhere (i.e. array overrun or similar).
In order to track this problem you should set a breakpoint on main and verify that the variables are indeed 0. If not, it would be a compiler bug.
If the variable is 0 then set a memory access breakpoint on it and you should see where you are corrupting it.
Without seeing the code it's really not helpfull to ask here, as any answer is just guessing, so we can only provide generic answers.

Variables declared as static should be initialised to 0, because the bss should be initialised to 0 on startup.
Adding printf statements and having the problem go away sounds like it could be a memory corruption issue. Are you accessing an array out of bounds, overflowing the stack, etc?

Related

c heap memory layout

I'm working on a fairly involved c program that is using a linked list of structures. There is a global variable declared at the top of one of my .c files (queue_head). This variable is now getting overwritten mysteriously, which causes seg faults.
Is there some way to find out the layout of my variables in memory. My best guess as to the cause is that something is writing past the end of it's memory and overwriting the queue_head memory location.
To test that I reversed the positions of queue_head and queue_tail in my source file, and the next time I ran the program, queue_tail was getting corrupted but queue_head was fine.
Of course it could be an errant pointer, but my belief is that it's something behind it in memory getting written to past it's memory allocation.
Is there a way to see how c maps out the variables short of disassembly?
Variables are generally laid out in order of appearance in file, but if program has several files, then it's harder to predict. You can print their addresses to find possible offender:
printf("%p %p\n", &queue_head, &fishy_var);
But what's good in finding previous variable? It might well happen that broken code doesn't reference it directly.
A better approach is to use debugger. For example if you can run program in gdb, then you can use watchpoint to break at location where memory gets corrupted: watch queue_head. After this program will break every time value of queue_head changes, including legitimate uses. To isolate legitimate uses add dummy pointer variable just before queue_head and watch it:
void *fishy_var;
queue *queue_head;
And then in gdb:
break main
run
watch fishy_var
continue
This should lead right to the offending code.
Note, that you'd better put printf I showed earlier at the start of main, to 1. make sure variables are consecutive in memory and 2. make sure compiler doesn't optimize away unused fishy_var.
From most compilers (or more likely linkers), you can generate a .map file, which will list all global variables and functions.
You don't mention which compiler you use, but for gcc this will answer your question.
Basically, add this to your gcc line:
-Xlinker -Map=output.map

Initialise Stack pointer on ATtiny2313

I am programming an ATtiny2313 using avrdude and a makefile. I believe the stack pointer is not properly initialised, since when I call a function, the program appears to freeze. I found the following assembly code:
.include "tn2313def.inc"
ldi r16, low(RAMEND) ; Main program start
out SPL,r16 ;Set Stack Pointer to top of RAM
which I think might work, but I don't know how I can incorporate it into the c code that I created. ie. do I need to include a special header file or somehow denote that it is assembly and not c. I am relatively new to programming and I would appreciate any help either as to how to implement this code properly or another way of making my current c code initialise a stack pointer.
Thank you in advance.
Stephen
It really depends on how you've got your makefile configured as to whether the stack pointer will be initialised. If you're using gcc and the normal compile and link options, the linker ensures that some startup code crtX.o is also included in your executable. The linker automatically chooses the correct crtX.o file for your processor and compile options.
Amongst other things, the code in the crtX.o files will clear the bss segment to be all zeros as required by the C standard, configure your stack pointer and provide interrupt vectors in the correct location for those which have not been overridden.
Remember that the ATTiny2313 only has 128 bytes of SRAM. This area must be big enough for any initialised data you have in your program and the stack. Just the process of calling a simple function will use up quite a number of bytes of RAM to save the registers on the stack before calling the function.
So, I'd suggest to do these things:
Use the standard makefile if one is provided by your compiler, it will ensure that the standard startup code is included and that the stack/RAM is set up correctly before main() is called.
Turn on the linker map and symbol file output and verify that you actually have some space free that can be used for the stack.
The Atmel IDE has a reasonable simulator, so try running your code in the simulator. You'll be able to watch stack usage as you are calling the function and location any odd behaviour.
You may just happen to have a stack overflow (which is why you came to stackoverflow.com right?

global variable initialization optimization

Global variables are initialized to "0" by default.
How much difference does it make (if any) when I explicitly assign value "0" to it.
Is any one of them faster/better/more optimized?
I tried with a small sample .c program but I do not see any change in executable size.
Edit:0 I just want to understand the behavior. Its not a bottleneck for me in any way.
The answer to your question is very implementation specific but typically all uninitialized global and static variables end up in the .bss segment. Explicitly initialized variables are located in some other data segment. Both of these will be copied over by the program loader before the execution of main(). So, there shouldn't be any performance difference between explicitly initializing to zero, and leaving the variable uninitialized.
IMO it is good practice to explicitly initialize globals and statics to zero, as it makes it clear that a zero initial value is expected.
When you say optimized, I am assuming you mean faster in execution. If so, then there won't be any difference. And the compiler might even remove the initialization of the global variable (not sure on the compiler internals). And if you mean the space utilization of the program - there won't be a difference in that either.
Bigger question though is - is there a specific reason you are trying to look to optimize via the initialization of global variables. Can you please explain a bit more.
Static objects without an explicit initializer are initialized to zero at startup. Whether you explicitly initialize the object to 0 or not will probably make no difference in term of performance as the compiler usually initialize all the zero objects in one go before main.
// File scope
// Same code is likely to be generated for the two objects initialization
int bla1;
int bla2 = 0;
On the other hand, if you assign 0 instead of initializing, it could make a difference because the compiler could not infer what was the previous value of the object.
void init(void)
{
bla1 = 0;
bla2 = 0;
}
I doubt there is a difference, but, even if there is, I have much more doubts about the fact that your program is so optimized that the bottleneck is that.
I'd rather suggest not to care at all about all this kind of issues and write the code as you like, maybe giving way to readability rather than speed, leaving optimization only as a final problem.
Premature optimization is the root of all evil
There is none. The optimizer sees that as a no-op.
Explicit initialization is more verbose and clearer to the untrained eye. If you have juniors in your team, I'd explicitly initialize these variables.

Why are static variables auto-initialized to zero? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why global and static variables are initialized to their default values?
What is the technical reason this happens? And is it supported by the standard across all platforms? Is it possible that certain implementations may return undefined variables if static variables aren't explicitly initialized?
It is required by the standard (§6.7.8/10).
There's no technical reason it would have to be this way, but it's been that way for long enough that the standard committee made it a requirement.
Leaving out this requirement would make working with static variables somewhat more difficult in many (most?) cases. In particular, you often have some one-time initialization to do, and need a dependable starting state so you know whether a particular variable has been initialized yet or not. For example:
int foo() {
static int *ptr;
if (NULL == ptr)
// initialize it
}
If ptr could contain an arbitrary value at startup, you'd have to explicitly initialize it to NULL to be able to recognize whether you'd done your one-time initialization yet or not.
Yes, it's because it's in the standard; but really, it's because it's free. Static variables look just like global variables to the generated object code. They're allocated in .bss and initialized at load time along with all your constants and other globals. Since the section of memory where they live is just copied straight from your executable, they're initialized to a value known at compile-time for free. The value that was chosen is 0.
Of course there is no arguing that it is in the C standards. So expect a compliant compiler to behave that way.
The technical reason behind why it was done might be rooted in how the C startup code works. There are usually several memory segments the linker has to put compiler output into including a code (text) segment, a block storage segment, and an initialized variable segment.
Non-static function variables don't have physical storage until the scope of the function is created at runtime so the linker doesn't do anything with those.
Program code of course goes in the code (or text) segment but so do the values used to initialize global and static variables. Initialized variables themselves (i.e. their addresses) go in the initialized memory segment. Uninitialized global and static variables go in the block storage (bss) segment.
When the program is loaded at execution time, a small piece of code creates the C runtime environment. In ROM based systems it will copy the value of initialized variables from the code (text) segment into their respective actual addresses in RAM. RAM (i.e. disk) based systems can load the initial values directly to the final RAM addresses.
The CRT (C runtime) also zeroes out the bss which contains all the global and static variables that have no initializers. This was probably done as a precaution against uninitialized data. It is a relatively straightforward block fill operation because all the global and static variables have been crammed together into one address segment.
Of course floats and doubles may require special handling because their 0.0 value may not be all zero bits if the floating format is not IEEE 754.
Note that since autovariables don't exist at program load time they can't be initialized by the runtime startup code.
Mostly because the static variables are grouped together in one block by the linker, so it's real easy to just memset() the whole block to 0 on startup. I to not believe that is required by the C or C++ Standards.
There is discussion about this here:
First of all in ISO C (ANSI C), all static and global variables must be initialized before the program starts. If the programmer didn't do this explicitly, then the compiler must set them to zero. If the compiler doesn't do this, it doesn't follow ISO C. Exactly how the variables are initialized is however unspecified by the standard.
Take a look : here 6.2.4(3) and 6.7.8 (10)
Suppose you were writing a C compiler. You expect that some static variables are going to have initial values, so those values must appear somewhere in the executable file that your compiler is going to create. Now when the output program is run, the entire executable file is loaded into memory. Part of the initialization of the program is to create the static variables, so all those initial values must be copied to their final static variable destinations.
Or do they? Once the program starts, the initial values of the variables are not needed anymore. Can't the variables themselves be located within the executable code itself? Then there is no need to copy the values over. The static variables could live within a block that was in the original executable file, and no initialization at all has to be done for them.
If that is the case, then why would you want to make a special case for uninitialized static variables? Why not just put a bunch of zeros in the executable file to represent the uninitialized static variables? That would trade some space for a little time and a lot less complexity.
I don't know if any C compiler actually behaves in this way, but I suspect the option of doing things this way might have driven the design of the language.

Why compilers creates one variable "twice"?

I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.
Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.
So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.
What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)
Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.
not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10
Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.

Resources