Why variables start out with random values in C - c

I think this is wrong, it should start as NULL and not with a random value. In the case that you have a pointer with a random memory address as its default value it could be a very dangerous thing, no?

The variables start out uninitialized because that's the fastest way - why waste the CPU cycles on initialization if you're going to write another value there anyway?
If you want a variable to be initialized after creation, just initialize it. :)
About it being a dangerous thing: Every good compiler will warn you if you try to use a variable without initialization.

No. C is a very efficient language, one that has traditionally been faster that a lot of other languages. One of the reasons for this is that it doesn't do too much on it's own. The programmer controls this.
In the case of initialization, C variables are not initialized to a random value. Rather, they are not initialized and so they contain whatever was at the memory location before.
If you wanted to initialize a variable to, say, 1 in your program, then it would be inefficient if the variable had already been initialized to zero or null. That would mean it was initialized twice.

Execution speed and overhead (or lack thereof) are the main reasons why. C is notorious for letting you walk off the proverbial cliff because it always assumes that the user knows better than it does.
Note that if you declared the variable as static it actually is guaranteed to be initialized to 0.

Variables start out with a random value because you are just handed a block of memory and told to deal with it yourself. It has whatever value that block of memory had before hand. Why should the program waste time setting the value to some arbitrary default when you are likely going to set it yourself later?

The design choice is performance, and it is one of the many reasons why C isn't the preferred language for most projects.

This has nothing to do with "if C were being designed today" or with efficiency of one initialization. Instead think of something like
void foo()
{
struct bar *ptrs[10000];
/* do something where only a few indices end up actually getting used */
}
Any language that forces useless initialization on you is doomed to be slow as hell for algorithms that can make use of sparse arrays where you don't care about the majority of the values, and have an easy way of knowing which values you care about.
If you don't like my example with such a large object on the stack, substitute malloc instead. It has the same semantics with regard to initialization.
In either case, if you want zero-initialization, you can get it with {0} or calloc.

It was a design choice made many ears ago, probably for efficiency reasons.
Statically allocated variables (globals and statics) are initialized to 0 if there's no explicit initialization - this could be justified even taking efficiency into account becuase it only occurs once. I'd guess the thinking was that for automatic variables (locals) that are allocated each time a scope is entered, implicit initialization was considered something that might cost too much and therefore should be left to the programmer's responsibility.
If C were being designed today, I wouldn't be surprised if that design decision were changed - especially since compilers are intelligent enough today to be able to optimize away an initialization that gets overwritten before any other use (or potential use).
However, there are so many C compiler toolchains that follow the spec of not initializing automatically, it would be foolish for a compiler to perform implicit initialization to a 'useful' value (like 0 or NULL). That would just encourage people targeting that tool chain to write code that didn't work correctly on other tool chains.
However, compilers can initialize local variables, and they often do. It's just that they initialize the locals to a values that's not generally useful (especially, that doesn't set a pointer to the null pointer). That kind of initialization isn't useful in writing your programming logic against, and it's not intended for that. It's intended to cause deterministic and reproducible errors so that if you erroneously use values that have been set by implicit initialization, you'll be able to find it easily in test/debug.
Usually this compiler behavior is turned on only for debug builds; I could see an argument being made for turning it on in release builds as well - particular if the release build can still optimize it away when the compiler can prove that the implicit initialized value is never used.

Related

Why C allows uninitialized local variables?

Looking into languages such as Java & C# use of uninitialized local variable is compile time error. Then why C & C++ allows uninitialized local variables? What is the reason that these languages allows this? I think many of the bad problems can't arise or can be prevented if these 2 languages forces programmer to mandatory initialize local variables including pointers & it also makes language more secure. Doesn't it?
The C language is well known for producing very fast and efficient code.
With that in mind, it makes sense for the language not to automatically initialize all variables. With languages that automatically initialize variables, if you later initialize them in code, they actually get initialized twice, which is less efficient and serves no purpose.
You are correct, C is an advanced language and it requires more care by experienced developers to ensure problems are not introduced by forgetting to initialize variables, or forgetting to do other things that are not automatic.
int main() {
int i;
// Here i is uninitialized
scanf("%d", &i);
}
You don't need i to be initialized before scanf(). For such cases, C doesn't waste cycles initializing everything.
C is designed to be a tool to allow great programmers to write efficient code, rather than to prevent beginners from shooting themselves in the foot. Allowing the use of uninitialized variables is a nod in that direction.
Here is a good description from this article which discusses the advantages of undefined behavior in general.
Use of an uninitialized variable: This is commonly known as source of
problems in C programs and there are many tools to catch these: from
compiler warnings to static and dynamic analyzers. This improves
performance by not requiring that all variables be zero initialized
when they come into scope (as Java does). For most scalar variables,
this would cause little overhead, but stack arrays and malloc'd memory
would incur a memset of the storage, which could be quite costly,
particularly since the storage is usually completely overwritten.
C++ allows them because C does. Where possible, C++ compilers accept valid C programs as input.
As for C: in olden times, memory was scarce and operations were slow. Initializing a variable that did not yet need initialization was wasted time and space.
Since all variables had to be declared at the start of a function definition (often before the value that would initialize them was known), uninitialized variables were a necessity. Dangerous, but efficient. There are a lot of trade-offs in the world.

Coding standards: variable initialization

Some coding standards mandate initializing each variable when you declare it, even if the value is meaningless and the variable will soon be over written.
For example:
main()
{
char rx_char;
:
:
rx_char = get_char();
}
My co-worker asks me to initialize rx_char, but I don't understand why. Could anybody point out the reason?
"Initializing each variable when you declare it, even if the value is meaningless" is a schoolbook example of a cargo cult behavior. It is a completely meaningless requirement, which should be avoided whenever possible. In many cases a meaningless initializer is not much better than a random uninitialized value. In cases when it can actually "save the day", it does so by sweeping the problem under the carpet, i.e. by hiding it. Hidden problems are always the worst. When the code has a problem in it, it is always better to have it manifest itself as quickly as possible, even through having the code to crash or behave erratically.
Additionally, such requirements can impede compiler optimizations by spamming the compiler with misleading information about the effective value lifetime associated with the variable. The optimizer can treat an uninitialized variable as non-existent, thus saving valuable resources (e.g. CPU registers). In order to acquire the same kind of knowledge about a variable initialized with a meaningless value, the optimizer has to be able to figure out that it is indeed meaningless, which is generally a more complicated task.
The same problem has to be solved by a person reading the code, which impedes readability. A person unfamiliar with the code cannot say right away whether the specific initializer is meaningful (i.e. the code below relies on the initial value) or meaningless (i.e. provided thoughtlessly by a follower of the aforementioned cargo cult).
It is worth noting also that some modern compilers can detect attempts to use uninitialized variable values at both compile-time and run-time. Such compilers issue compile-time warnings or run-time assertions (in debugging builds) when such attempt is detected. In my experience this feature is much more useful than it might appear at first sight. Meanwhile, thoughtless initialization of variables with meaningless values effectively defeats this compiler-provided safety feature and, again, hides the associated errors. It certainly does more harm than good from that point of view.
Fortunately, the problem is no longer as acute as it used to be. In the modern versions of the C language variables can be declared anywhere in the code, which significantly reduces the harmful effects of such coding standards. You no longer have to declare variables at the beginning of the block, which greatly increases the chances that by the time you are ready to declare the variable you are also ready to supply a meaningful initializer for it.
Initializers are good. But when you see that you cannot provide a meaningful initializer for a variable, it is always a better idea to attempt to reorganize the code in order to create an opportunity for a meaningful initializer, instead of giving up and just using a meaningless one. This is often possible even in pre-C99 code.
With modern compilers often giving a warning about the use of uninitialised variables I'd favour not initialising towards some bogus value. Then, if someone accidentally uses the variable before the initialisation you'll get a warning, rather than having to spend X time debugging and finding out the error.
It's good practice to initialize your variables, even if it's just to set them to the types "null" equivalent. If someone else comes along and wants to edit that code, and uses rx_char before you've initialized it, they will get non-deterministic behavior. The value of rx_char would be dependent on things outside of your control. Some compilers 0 initialize variables, but you should not depend on this!
By initializing it on declaration, you're guaranteeing deterministic behavior. Which is in general, easier to debug, than having rx_char be something completely random, as it could potentially be, if you left it uninitialized.
main() {
char rx_char;
//Some stuff
char your_coworkers_char = rx_char;
//Some other stuff
rx_char = get_char();
putchar(your_cowowarers_char); //Potentially prints randomness!
}
This is perhaps optimal, unless you're working with super old style compilers you should be fine with this.
main() {
//some stuff
char rx_char = get_char();
putchar(your_cowowarers_char); //Potentially prints randomness!
}
Anybody could point out the reason?
The first person you should ask is the co-worker that asked you to do it. Her reasons may be entirely different than what we come up with.
...
Most coding standards are as much about the future as they are about what happens today.
Today, yes, it doesn't make any difference if you have char rx_char or char rx_char = 0. But in the future, what happens if someone deletes the rx_char=getchar() line? Then rx_char is a random, uninitialized value. If rx_char is 0 then you'll still have a bug, but it won't be random.

Economizing on variable use

I am working on some (embedded) device, recently I just started thinking maybe to use less memory, in case stack size isn't that big.
I have long functions (unfortunately).
And inside I was thinking to save space in this way.
Imagine there is code
1. void f()
2. {
3. ...
4. char someArray[300];
5. char someOtherArray[300];
6. someFunc(someArray, someOtherArray);
7. ...
8. }
Now, imagine, someArray and someOtherArray are never used in f function beyond line: 6.
Would following save some stack space??
1. void f()
2. {
3. ...
4. {//added
5. char someArray[300];
6. char someOtherArray[300];
7. someFunc(someArray, someOtherArray);
8. }//added
9. ...
8. }
nb: removed second part of the question
For the compiler proper both are exactly the same and thus makes no difference. The preprocessor would replace all instances of TEXT1 with the string constant.
#define TEXT1 "SomeLongStringLiteral"
someFunc(TEXT1)
someOtherFunc(TEXT1)
After the preprocessor's job is done, the above snippet becomes
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
Thus it makes no difference performance or memory-wise.
Aside: The reason #define TEXT1 "SomeLongStringLiteral" is done is to have a single place to change all instances of TEXT1s usage; but that's a convinience only for the programmer and has no effect on the produced output.
recently I just started thinking maybe to use less memory, in case stack size isn't that big.
Never micro optimise or prematurely optimise. In case the stack size isn't that big, you'll get to know it when you benchmark/measure it. Don't make any assumptions when you optimise; 99% of the times it'd be wrong.
I am working on some device
Really? Are you? I wouldn't have thought that.
Now, imagine, someArray and someOtherArray are never used in f function beyond line 6. Would following save some stack space?
On a good compiler, it wouldn't make a difference. By the standard, it isn't specified if it saves or not, it isn't even specified if there is a stack or not.
But on a not so good compiler, the one with the additional {} may be better. It is worth a test: compile it and look at the generated assembler code.
it seems my compiler doesn't allow me to do this (this is C), so never mind...
But it should so. What happens then? Maybe you are just confusing levels of {} ...
I'll ask another one here.
Would better be a separate question...
someFunc("SomeLongStringLiteral");
someOtherFunc("SomeLongStringLiteral");
vs.
someFunc(TEXT1)
someOtherFunc(TEXT1)
A #define is processed before any compilation step, so it makes absolutely no difference.
If it happens within the same compilation unit, the compiler will tie them together anyway. (At least, in this case. On an ATXmega, if you use PSTR("whatever") for having them in flash space only, each occurrence of them will be put into flash separately. But that's a completely different thing...)
Modern compilers should push stack variables before they are used, and pop them when they are no longer needed. The old thinking with { ... } marking the start and end of a stack push/pop should be rather obsolete by now.
Since 1999, C allows stack variables to be allocated anywhere and not just immediately after a {. C++ allowed this far earlier. Today, where the local variable is declared inside the scope has little to do with when it actually starts to exist in the machine code. And similarly, the } has little to do with when is ceases to exist.
So regarding adding extra { }, don't bother. It is premature optimization and only adds pointless clutter.
Regarding the #define it absolutely makes no difference in terms of efficiency. Macros are just text replacement.
Furthermore, from the generic point-of-view, data must always be allocated somewhere. Data used by a program cannot be allocated in thin air! That's a very common misunderstanding. For example, many people incorrectly believe that
int x = func();
if(x == something)
consumes more memory than
if(func() == something)
But both examples compile into identical machine code. The result of func must be stored somewhere, it cannot be stored in thin air. In the first example, the result is stored in a memory segment that the programmer may refer to as x.
In the second example, it is stored in the very same memory segment, taking up the same amount of space, for the same duration of program execution. The only difference is that the memory segment is anonymous and the programmer has no name for it. As far as the machine code is concerned, that doesn't matter, since no variable names exist in machine code.
And this would be why every professional C programmer needs to understand a certain amount of assembler. You cannot hope to ever do any kind of manual code optimization if you don't.
(Please don't ask two questions in one, this is really annoying since you get two types of answer for your two different questions.)
For your first question. Probably putting {} around the use of a variable will not help. The lifetime of automatic variables that are not VLA (see below) is not bound to the scope in which it is declared. So compilers may have a hard time in figuring out how the use of the stack may be optimized, and maybe don't do such an optimization at all. In your case this is most likely the case, since you are exporting pointers to your data to a function that is perhaps not visible, here. The compiler has no way to figure out if there is a valid use of the arrays later on in the code.
I see two ways to "force" the compiler into optimizing that space, functions or VLA. The first, functions is simple: instead of putting the block around the code, put it in a static function. Function calls are quite optimized on modern platforms, and here the compiler knows exactly how he may clear the stack at the end.
The second alternative in your case is a VLA, variable length array, if you compiler supports that c99 feature. Arrays that have a size that doesn't depend on a compile time constant have a special rule for their lifetime. That lifetime exactly ends at the end of the scope where they are defined. Even a const-qualified variable could be used for that:
{
size_t const len = 300;
char someArray[len];
char someOtherArray[len];
someFunc(someArray, someOtherArray);
}
At the end, on a given platform, you'd really have to inspect what assembler your compiler produces.

Using Structs in Functions

I have a function and i'm accessing a struct's members a lot of times in it.
What I was wondering about is what is the good practice to go about this?
For example:
struct s
{
int x;
int y;
}
and I have allocated memory for 10 objects of that struct using malloc.
So, whenever I need to use only one of the object in a function, I usually create (or is passed as argument) pointer and point it to the required object (My superior told me to avoid array indexing because it adds a calculation when accessing any member of the struct)
But is this the right way? I understand that dereferencing is not as expensive as creating a copy, but what if I'm dereferencing a number of times (like 20 to 30) in the function.
Would it be better if i created temporary variables for the struct variables (only the ones I need, I certainly don't use all the members) and copy over the value and then set the actual struct's value before returning?
Also, is this unnecessary micro optimization? Please note that this is for embedded devices.
This is for an embedded system. So, I can't make any assumptions about what the compiler will do. I can't make any assumptions about word size, or the number of registers, or the cost of accessing off the stack, because you didn't tell me what the architecture is. I used to do embedded code on 8080s when they were new...
OK, so what to do?
Pick a real section of code and code it up. Code it up each of the different ways you have listed above. Compile it. Find the compiler option that forces it to print out the assembly code that is produced. Compile each piece of code with every different set of optimization options. Grab the reference manual for the processor and count the cycles used by each case.
Now you will have real data on which to base a decision. Real data is much better that the opinions of a million highly experience expert programmers. Sit down with your lead programmer and show him the code and the data. He may well show you better ways to code it. If so, recode it his way, compile it, and count the cycles used by his code. Show him how his way worked out.
At the very worst you will have spent a weekend learning something very important about the way your compiler works. You will have examined N ways to code things times M different sets of optimization options. You will have learned a lot about the instruction set of the machine. You will have learned how good, or bad, the compiler is. You will have had a chance to get to know your lead programmer better. And, you will have real data.
Real data is the kind of data that you must have to answer this question. With out that data nothing anyone tells you is anything but an ego based guess. Data answers the question.
Bob Pendleton
First of all, indexing an array is not very expensive (only like one operation more expensive than a pointer dereference, or sometimes none, depending on the situation).
Secondly, most compilers will perform what is called RVO or return value optimisation when returning structs by value. This is where the caller allocates space for the return value of the function it calls, and secretly passes the address of that memory to the function for it to use, and the effect is that no copies are made. It does this automatically, so
struct mystruct blah = func();
Only constructs one object, passes it to func for it to use transparently to the programmer, and no copying need be done.
What I do not know is if you assign an array index the return value of the function, like this:
someArray[0] = func();
will the compiler pass the address of someArray[0] and do RVO that way, or will it just not do that optimisation? You'll have to get a more experienced programmer to answer that. I would guess that the compiler is smart enough to do it though, but it's just a guess.
And yes, I would call it micro optimisation. But we're C programmers. And that's how we roll.
Generally, the case in which you want to make a copy of a passed struct in C is if you want to manipulate the data in place. That is to say, have your changes not be reflected in the struct it self but rather only in the return value. As for which is more expensive, it depends on a lot of things. Many of which change implementation to implementation so I would need more specific information to be more helpful. Though, I would expect, that in an embedded environment you memory is at a greater premium than your processing power. Really this reads like needless micro optimization, your compiler should handle it.
In this case creating temp variable on the stack will be faster. But if your structure is much bigger then you might be better with dereferencing.

Efficiency of C Variable Declaration [duplicate]

This question already has answers here:
How is conditional initialization handled and is it a good practice?
(5 answers)
Closed 9 years ago.
How long does it take to declare a variable in C, for example int x or unsigned long long var? I am wondering if it would make my code any faster in something like this.
for (conditions) {
int var = 0;
// code
}
Would it be faster to do this, or is it easier not to?
int var;
for (conditions) {
var = 0;
// code
}
Thanks for the help.
One piece of advice: stop worrying about which language constructs are microscopically faster or slower than which others, and instead focus on which ones let you express yourself best.
Also, to find out where your code is spending time, use a profiler.
And as others have pointed out, declarations are purely compile-time things, they don't affect execution time.
It doesn't make any difference. In a traditional implementation the declaration itself (excluding initialization) generates no machine instructions. Function prologue code typically allocates space in the stack for all local variables at once, regardless of where they are declared.
However, where you declare your local variables can affect the performance of your code indirectly, in theory at least. When you declare the variables as locally as possible (your first variant), in general case it results in smaller size of the stack frame reserved by the function for its local variables (since the same location in the stack can be shared by different local variables at different times). Having smaller stack frame reduces the general stack memory consumption, i.e. as nested function calls are performed stack size doesn't grow as fast (especially noticeable with recursive functions). It generally improves performance since new stack page allocations happen less often, and stack memory locality becomes better.
The latter considerations are platform-dependent, of course. It might have very little or no effect on your platform and/or for your applications.
Whenever you have a question about performance, the best thing to do is wrap a loop around it (millions of iterations) and time it. But, in this case, you will likely find that it makes no difference.
It is more important to properly express the intentions of your code. If you need the variable outside your loop, delare it outside. If you only need the variable inside the loop, declare it inside.
You should always declare and initialize variables in narrowest scope possible.
You shouldn't be worrying about those types of micro-optimizations anyway (except in the rarest, rarest of cases). If you really need to worry about potential nano-second performance improvements, measure the difference. It is very unlikely that your variable declarations will be the largest bottleneck in your application.
It takes no time at all. The memory for global variables is allocated at startup, and "declaring" variables on the stack simply involves how far "up" the stack pointer moves when the function is called.
declarations are purely compile time, they cost nothing at runtime¹. But the first piece of code is still better than the second for two reasons
you should always initialize variables when you declare them, they way they can never have uninitialized values. This goes hand in hand with
always use the narrowest possible scope for variable declarations
So your first example, while no faster than the second, is still better.
And all of the people who chimed in telling him not to prematurely or micro optimize his code are wrong. It is never bad to know how costly various bits of code are. The very best programmers have a solid, almost unconcious, grasp of the cost of various strategies and take that into account automatically when they design. The way you become that programmer is to ask just this sort of question when you are a beginner.
¹ In fact, there is a small cost when each function allocates space for local variables, but that cost is the same regardless of how many local variables there are*.
*ok that's not really true, but the cost depends only on the total amount of space, not the number of variables.
Declaration takes no time at all.
The compiler will interpret that line as a notification that space for it will need to exist on the stack.
As others have already said, it shouldn't take any time. Therefore you need to make this decision based on other factors: what would make your code more readable and less prone to bugs. It's generally considered a good practice to declare a variable as close as possible to its usage (so you can see the declaration and usage in one go). If it's only used in the inner scope then just declare it inside that scope - forget about performance on this one.
Declaring variables does take time, as it results in machine language instructions that allocate the space for the variables on the stack. This is simply an increment of the stack pointer, which takes a tiny, but non-zero amount of time.
I believe your question is whether more time will be required (i.e. more stack increment operations) if the variable is declared inside the loop. The answer is no, since the stack is incremented once only for the loop block, not each time the loop is executed. So, there will be no difference in time either way, even if the loop executes zillions of zillions of times.
Disclaimer: Precisely what happens depends on your compiler, architecture, etc. But conceptually here's what's going on:
When you declare a variable within a method, it is allocated on the stack. Allocating something on the stack only involves bumping up the stack pointer by the size of the variable. So, for example, if SP represents the memory address of the top of the stack, declaring char x results in SP += 1 and int x results in SP += 4 (on a 32 bit machine).
When the function exits, the stack pointer is returned to where it was before your method was called. So deallocating everything is fast, too.
So, either way it's just an add, which takes the same amount of time regardless of the amount of data.
A smart compiler will combine several variable declarations into a single add.
When you declare a variable within a loop, in theory it could be changing the stack pointer on each iteration through the loop, but again, a smart compiler probably won't do that.
(A notable exception is C++, which does extra work because it needs to call constructors and destructors when the stack-allocated object is created or destroyed.)
I wouldn't care about a nanosecond here or there. Unless you need to access its value after the for loop ends, leave the variable inside the loop: it will be closer to the code that uses it (your code will be more readable), and its scope will be bounded by the loop itself (your code will be more elegant and less bug-prone).
I bet the compiled binary will be identical for both cases.
Variable declaration is turned into stack space reservation by the compiler. Now how does this work is entirely platform-dependent. On x86 and pretty much every popular architecture this is just a subtraction from the address of the stack frame and\or indexing addressing mode to access from the top of the stack. All these come with the cost of a simple subtraction\addition, which is really irrelevant.
Technically the second example is less efficient, because the declaration happens on every entry into the loop scope, i.e. on every loop iteration. However it is 99.99% chance that the stack space will be reserved only once.Even the assignment operation will be optimized away, although technically it should be done every loop iteration. Now in C++ this can get much worse, if the variable has a constructor which will then be run on every loop iteration.
And as a bottom line, you really should not worry about any of such issues without proper profiling. And even then there are much more valuable questions to ask yourself here, like "what is the most readable way to do this, what is easier to understand and maintain, etc.".

Resources