Why C allows uninitialized local variables? - c

Looking into languages such as Java & C# use of uninitialized local variable is compile time error. Then why C & C++ allows uninitialized local variables? What is the reason that these languages allows this? I think many of the bad problems can't arise or can be prevented if these 2 languages forces programmer to mandatory initialize local variables including pointers & it also makes language more secure. Doesn't it?

The C language is well known for producing very fast and efficient code.
With that in mind, it makes sense for the language not to automatically initialize all variables. With languages that automatically initialize variables, if you later initialize them in code, they actually get initialized twice, which is less efficient and serves no purpose.
You are correct, C is an advanced language and it requires more care by experienced developers to ensure problems are not introduced by forgetting to initialize variables, or forgetting to do other things that are not automatic.

int main() {
int i;
// Here i is uninitialized
scanf("%d", &i);
}
You don't need i to be initialized before scanf(). For such cases, C doesn't waste cycles initializing everything.

C is designed to be a tool to allow great programmers to write efficient code, rather than to prevent beginners from shooting themselves in the foot. Allowing the use of uninitialized variables is a nod in that direction.
Here is a good description from this article which discusses the advantages of undefined behavior in general.
Use of an uninitialized variable: This is commonly known as source of
problems in C programs and there are many tools to catch these: from
compiler warnings to static and dynamic analyzers. This improves
performance by not requiring that all variables be zero initialized
when they come into scope (as Java does). For most scalar variables,
this would cause little overhead, but stack arrays and malloc'd memory
would incur a memset of the storage, which could be quite costly,
particularly since the storage is usually completely overwritten.

C++ allows them because C does. Where possible, C++ compilers accept valid C programs as input.
As for C: in olden times, memory was scarce and operations were slow. Initializing a variable that did not yet need initialization was wasted time and space.
Since all variables had to be declared at the start of a function definition (often before the value that would initialize them was known), uninitialized variables were a necessity. Dangerous, but efficient. There are a lot of trade-offs in the world.

Related

Coding standards: variable initialization

Some coding standards mandate initializing each variable when you declare it, even if the value is meaningless and the variable will soon be over written.
For example:
main()
{
char rx_char;
:
:
rx_char = get_char();
}
My co-worker asks me to initialize rx_char, but I don't understand why. Could anybody point out the reason?
"Initializing each variable when you declare it, even if the value is meaningless" is a schoolbook example of a cargo cult behavior. It is a completely meaningless requirement, which should be avoided whenever possible. In many cases a meaningless initializer is not much better than a random uninitialized value. In cases when it can actually "save the day", it does so by sweeping the problem under the carpet, i.e. by hiding it. Hidden problems are always the worst. When the code has a problem in it, it is always better to have it manifest itself as quickly as possible, even through having the code to crash or behave erratically.
Additionally, such requirements can impede compiler optimizations by spamming the compiler with misleading information about the effective value lifetime associated with the variable. The optimizer can treat an uninitialized variable as non-existent, thus saving valuable resources (e.g. CPU registers). In order to acquire the same kind of knowledge about a variable initialized with a meaningless value, the optimizer has to be able to figure out that it is indeed meaningless, which is generally a more complicated task.
The same problem has to be solved by a person reading the code, which impedes readability. A person unfamiliar with the code cannot say right away whether the specific initializer is meaningful (i.e. the code below relies on the initial value) or meaningless (i.e. provided thoughtlessly by a follower of the aforementioned cargo cult).
It is worth noting also that some modern compilers can detect attempts to use uninitialized variable values at both compile-time and run-time. Such compilers issue compile-time warnings or run-time assertions (in debugging builds) when such attempt is detected. In my experience this feature is much more useful than it might appear at first sight. Meanwhile, thoughtless initialization of variables with meaningless values effectively defeats this compiler-provided safety feature and, again, hides the associated errors. It certainly does more harm than good from that point of view.
Fortunately, the problem is no longer as acute as it used to be. In the modern versions of the C language variables can be declared anywhere in the code, which significantly reduces the harmful effects of such coding standards. You no longer have to declare variables at the beginning of the block, which greatly increases the chances that by the time you are ready to declare the variable you are also ready to supply a meaningful initializer for it.
Initializers are good. But when you see that you cannot provide a meaningful initializer for a variable, it is always a better idea to attempt to reorganize the code in order to create an opportunity for a meaningful initializer, instead of giving up and just using a meaningless one. This is often possible even in pre-C99 code.
With modern compilers often giving a warning about the use of uninitialised variables I'd favour not initialising towards some bogus value. Then, if someone accidentally uses the variable before the initialisation you'll get a warning, rather than having to spend X time debugging and finding out the error.
It's good practice to initialize your variables, even if it's just to set them to the types "null" equivalent. If someone else comes along and wants to edit that code, and uses rx_char before you've initialized it, they will get non-deterministic behavior. The value of rx_char would be dependent on things outside of your control. Some compilers 0 initialize variables, but you should not depend on this!
By initializing it on declaration, you're guaranteeing deterministic behavior. Which is in general, easier to debug, than having rx_char be something completely random, as it could potentially be, if you left it uninitialized.
main() {
char rx_char;
//Some stuff
char your_coworkers_char = rx_char;
//Some other stuff
rx_char = get_char();
putchar(your_cowowarers_char); //Potentially prints randomness!
}
This is perhaps optimal, unless you're working with super old style compilers you should be fine with this.
main() {
//some stuff
char rx_char = get_char();
putchar(your_cowowarers_char); //Potentially prints randomness!
}
Anybody could point out the reason?
The first person you should ask is the co-worker that asked you to do it. Her reasons may be entirely different than what we come up with.
...
Most coding standards are as much about the future as they are about what happens today.
Today, yes, it doesn't make any difference if you have char rx_char or char rx_char = 0. But in the future, what happens if someone deletes the rx_char=getchar() line? Then rx_char is a random, uninitialized value. If rx_char is 0 then you'll still have a bug, but it won't be random.

C Language: Why does malloc() return a pointer, and not the value?

From my understanding of C it seems that you are supposed to use malloc(size) whenever you are trying to initialize, for instance, an array whose size you do not know of until runtime.
But I was wondering why the function malloc() returns a pointer to the location of the variable and why you even need that.
Basically, why doesn't C just hide it all from you, so that whenever you do something like this:
// 'n' gets stdin'ed from the user
...
int someArray[n];
for(int i = 0; i < n; i++)
someArray[i] = 5;
you can do it without ever having to call malloc() or some other function? Do other languages do it like this (by hiding the memory properties/location altogether)? I feel that as a beginner this whole process of dealing with the memory locations of variables you use just confuse programmers (and since other languages don't use it, C seems to make a simple initialization process such as this overly complicated)...
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore that for now...
C lets you manage every little bit of your program. You can manage when memory gets allocated; you can manage when it gets deallocated; you can manage how to grow a small allocation, etc.
If you prefer not to manage that and let the compiler do it for you, use another language.
Actually C99 allows this (so you're not the only one thinking of it). The feature is called VLA (VAriable Length Array).
It's legal to read an int and then have an array of that size:
int n;
fscanf("%d", &n);
int array[n];
Of course there are limitations since malloc uses the heap and VLAs use the stack (so the VLAs can't be as big as the malloced objects).
*edit: Ok, maybe there are some versions of C that I'm not aware of that allows you to forgo the use of malloc() but let's try to ignore
that for now...
So we can concentrate on the flame ?
Basically, what I'm trying to ask is why malloc() is even necessary,
because why the language doesn't take care of all that for you
internally without the programmer having to be concerned about or
having to see memory.
The very point of malloc(), it's raison d'ĂȘtre, it's function, if you will, is to allocate a block of memory. The way we refer to a block of memory in C is by its starting address, which is by definition a pointer.
C is close to 40 years old, and it's not nearly as "high level" as some more modern languages. Some languages, like Java, attempt to prevent mistakes and simplify programming by hiding pointers and explicit memory management from the programmer. C is not like that. Why? Because it just isn't.
Basically, what I'm trying to ask is why malloc() is even necessary, because why the language doesn't take care of all that for you internally without the programmer having to be concerned about or having to see memory. Thanks
One of the hallmarks of C is its simplicity (C compilers are relatively easy to implement); one way of making a language simple is to force the programmer to do all his own memory management. Clearly, other languages do manage objects on the heap for you - Java and C# are modern examples, but the concept isn't new at all; Lisp implementations have been doing it for decades. But that convenience comes at a cost in both compiler complexity and runtime performance.
The Java/C# approach helps eliminate whole classes of memory-management bugs endemic to C (memory leaks, invalid pointer dereferences, etc.). By the same token, C provides a level of control over memory management that allows the programmer to achieve high levels of performance that would be difficult (not impossible) to match in other languages.
If the only purpose of dynamic allocation were to allocate variable-length arrays, then malloc() might not be necessary. (But note that malloc() was around long before variable-length arrays were added to the language.)
But the size of a VLA is fixed (at run time) when the object is created. It can't be resized, and it's deallocated only when you leave the scope in which it's declared. (And VLAs, unlike malloc(), don't have a mechanism for reporting allocation failures.)
malloc() gives you a lot more flexibility.
Consider creating a linked list. Each node is a structure, containing some data and a pointer to the next node in the list. You might know the size of each node in advance, but you don't know how many nodes to allocate. For example, you might read lines from a text file, creating and appending a new node for each line.
You can also use malloc() along with realloc() to create a buffer (say, an array of unsigned char) whose size can be changed after you created it.
Yes, there are languages that don't expose pointers, and that handle memory management for you.
A lot of them are implemented in C.
Maybe the question should be "why do you need something like int array[n] when you can use pointers?"
After all, pointers allow you to keep an object alive beyond the scope it was created in, you can use pointer to slice and dice arrays (for example strchr() returns a pointer to a string), pointers are light-weight objects, so it's cheap to pass them to functions and return them from functions, etc.
But the real answer is "that's how it is". Other options are possible, and the proof is that there are other languages that do other things (and even C99 allows different things).
C is treated as highly developed low-level language, basically malloc is used in dynamic arrays which is a key component in stack & queue. for other languages that hides the pointer part from the developer are not well capable of doing hardware related programming.
The short answer to your question is to ponder this question: What if you also need to control exactly when the memory is de-allocated?
C is a compiled language, not an interpreted one. If you don't know n at compile time, how is the compiler supposed to produce a binary?

Is variable declaration within a loop bad?

I'm referring to the main static languages today (C, C++, java, C#,). I've heard some contradicting answers about this, so I wanted to know:
If I have some code such as:
loop(...) {
type x = val;
...
}
('loop' is some type of loop, e.g. for, while)
Will it cause memory allocation in each iteration of the loop, or just once? Is it different from writing this:
type x;
loop(...) {
x = val;
...
}
where memory is only allocated once for x?
The strictly correct answer is that it depends on the implementation, as both are semantically correct. No language specification would require or prohibit such implementation details.
That said, any implementation worth its salt will be able to reuse the same stack slot or even CPU register (with native compilation, especially likely in presence of a JIT). Even the bytecode will likely be completely identical.
And finally, there's that thing with premature optimization... Unless proven otherwise, you shouldn't even bother thinking about low-level details like this (if you think knowledge and control over such issues matters, perhaps you should just program in assembler), because:
Unless you're doing a microbenchmark (or a really huge number-crunching task - but how many people freaking out about performance actually do those?), you won't even notice any difference even if it isn't optimized. If you're doing anything of interest in the loop body, it will dwarf the difference (again, if any). Especially if you're doing any I/O.
Even if there is memory allocation, it boils down to pushing and popping a few bytes on the native stack, which in turn boils down to adding an integer constant to a hardware register. All C and C++ programs use that stack for their local variables, and non of those ever complained about its performance... if you have to reserve space, you can't get faster than using the stack.
If you have to ask this kind of question, you're not someone who could do anything about it. Those people know to just (1) measure it, (2) look at the generated code and (3) look for large-scale optimizations before even thinking on this level ;)

Why variables start out with random values in C

I think this is wrong, it should start as NULL and not with a random value. In the case that you have a pointer with a random memory address as its default value it could be a very dangerous thing, no?
The variables start out uninitialized because that's the fastest way - why waste the CPU cycles on initialization if you're going to write another value there anyway?
If you want a variable to be initialized after creation, just initialize it. :)
About it being a dangerous thing: Every good compiler will warn you if you try to use a variable without initialization.
No. C is a very efficient language, one that has traditionally been faster that a lot of other languages. One of the reasons for this is that it doesn't do too much on it's own. The programmer controls this.
In the case of initialization, C variables are not initialized to a random value. Rather, they are not initialized and so they contain whatever was at the memory location before.
If you wanted to initialize a variable to, say, 1 in your program, then it would be inefficient if the variable had already been initialized to zero or null. That would mean it was initialized twice.
Execution speed and overhead (or lack thereof) are the main reasons why. C is notorious for letting you walk off the proverbial cliff because it always assumes that the user knows better than it does.
Note that if you declared the variable as static it actually is guaranteed to be initialized to 0.
Variables start out with a random value because you are just handed a block of memory and told to deal with it yourself. It has whatever value that block of memory had before hand. Why should the program waste time setting the value to some arbitrary default when you are likely going to set it yourself later?
The design choice is performance, and it is one of the many reasons why C isn't the preferred language for most projects.
This has nothing to do with "if C were being designed today" or with efficiency of one initialization. Instead think of something like
void foo()
{
struct bar *ptrs[10000];
/* do something where only a few indices end up actually getting used */
}
Any language that forces useless initialization on you is doomed to be slow as hell for algorithms that can make use of sparse arrays where you don't care about the majority of the values, and have an easy way of knowing which values you care about.
If you don't like my example with such a large object on the stack, substitute malloc instead. It has the same semantics with regard to initialization.
In either case, if you want zero-initialization, you can get it with {0} or calloc.
It was a design choice made many ears ago, probably for efficiency reasons.
Statically allocated variables (globals and statics) are initialized to 0 if there's no explicit initialization - this could be justified even taking efficiency into account becuase it only occurs once. I'd guess the thinking was that for automatic variables (locals) that are allocated each time a scope is entered, implicit initialization was considered something that might cost too much and therefore should be left to the programmer's responsibility.
If C were being designed today, I wouldn't be surprised if that design decision were changed - especially since compilers are intelligent enough today to be able to optimize away an initialization that gets overwritten before any other use (or potential use).
However, there are so many C compiler toolchains that follow the spec of not initializing automatically, it would be foolish for a compiler to perform implicit initialization to a 'useful' value (like 0 or NULL). That would just encourage people targeting that tool chain to write code that didn't work correctly on other tool chains.
However, compilers can initialize local variables, and they often do. It's just that they initialize the locals to a values that's not generally useful (especially, that doesn't set a pointer to the null pointer). That kind of initialization isn't useful in writing your programming logic against, and it's not intended for that. It's intended to cause deterministic and reproducible errors so that if you erroneously use values that have been set by implicit initialization, you'll be able to find it easily in test/debug.
Usually this compiler behavior is turned on only for debug builds; I could see an argument being made for turning it on in release builds as well - particular if the release build can still optimize it away when the compiler can prove that the implicit initialized value is never used.

C function: is this dynamic allocation? initializating an array with a changing length

Suppose I have a C function:
void myFunction(..., int nObs){
int myVec[nObs] ;
...
}
Is myVec being dynamically allocated? nObs is not constant whenever myFunction is called. I ask because I am currently programming with this habit, and a friend was having errors with his program where the culprit is he didn't dynamically allocate his arrays. I want to know whether my habit of programming (initializing like in the above example) is a safe habit.
Thanks.
To answer your question, it's not considered dynamic allocation because it's in the stack. Before this was allowed, you could on some platforms simulate the same variable length allocation on the stack with a function alloca, but that was not portable. This is (if you program for C99).
It's compiler-dependent. I know it's ok with gcc, but I don't think the C89 spec allows it. I'm not sure about newer C specs, like C99. Best bet for portability is not to use it.
It is known as a "variable length array". It is dynamic in the sense that its size is determined at run-time and can change from call to call, but it has auto storage class like any other local variable. I'd avoid using the term "dynamic allocation" for this, since it would only serve to confuse.
The term "dynamic allocation" is normally used for memory and objects allocated from the heap and whose lifetime are determined by the programmer (by new/delete, malloc/free), rather than the object's scope. Variable length arrays are allocated and destroyed automatically as they come in and out of scope like any other local variable with auto storage class.
Variable length arrays are not universally supported by compilers; particularly VC++ does not support C99 (and therefore variable length arrays), and there are no plans to do so. Neither does C++ currently support them.
With respect to it being a "safe habit", apart from the portability issue, there is the obvious potential to overflow the stack should nObs be sufficiently large a value. You could to some extent protect against this by making nObs a smaller integer type uint8_t or uint16_t for example, but it is not a very flexible solution, and makes bold assumptions about the size of the stack, and objects being allocated. An assert(nObs < MAX_OBS) might be advisable, but at that point the stack may already have overflowed (this may be OK though since an assert() causes termination in any case).
[edit]
Using variable length arrays is probably okay if the size is either not externally determined as in your example.
[/edit]
On the whole, the portability and the stack safety issues would suggest that variable length arrays are best avoided IMO.

Resources