Recently I had to modify a legacy code that was compiled with a very old version of GCC (somewhere around version 2.3). Within a function, variable had to be declared before being used. I believe this is done C89 standard. This limitation is later removed.
My question is: Back then, why did they enforce this ruling? Was there any concern that could jeopardise the integrity of the software?
Variables still have to be declared before being used -- and they've never had to be declared just at the top of a function.
The C89 requirement is that a block consists of an opening {, followed by zero or more declarations, followed by zero or more statements, followed by the closing }.
For example, this is legal C89 (and, without the void, even K&R C, going back to 1978 or earlier):
int foo(void) {
int outer = 10;
{
int inner = 20;
printf("outer = %d, inner = %d\n", outer, inner);
}
printf("outer = %d, inner is not visible\n", outer);
return 0;
}
C99 loosened this, allowing declarations and statements to be mixed within a block:
int foo(void) {
int x = 10;
printf("x = %d\n", x);
int y = 20;
printf("y = %d\n", y);
return 0;
}
As for the reason for the original restriction, I think it goes back to C's ancestor languages: B, BCPL, and even Algol. It probably did make the compiler's job a bit easier. (I was thinking that it would make parsing easier, but I don't think it does; it still has to be able to distinguish whether something is a declaration or a statement without knowing in advance from the context.)
It was mainly to make compilers easier to write. If all the declarations were at the top of the function, it would be easy for the compiler to parse all the locals and determine how much stack is needed.
Of course now, compilers are a lot more mature than they were 30 years ago. So it makes sense to get rid of this restriction as it's become a nuisance to programmers.
Related
Since I found this particular documentation on https://www.tutorialspoint.com/c_standard_library/c_function_rand.htm,I have been thinking about this particular line of code srand((unsigned)time(&t));.Whenever I had to generate some stuff,I used srand(time(NULL)) in order not to generate the same stuff everytime I run the program,but when I came across this,I have been wondering :Is there any difference between srand((unsigned)time(&t)) and srand(time(NULL))?Because to me they seem like they do the same thing.Why is a time_t variable used?And why is the adress operator used in srand()?
#include <stdio.h>
#include<stdlib.h>
int main(){
int i,n;
time_t t;
n = 5;
srand((unsigned)time(&t));
for (i = 0; i < n; i++) {
printf("%d\n", rand() % 50);
}
return(0);
}
Yes, it will yield the same result. But the example is badly written.
I would be careful reading Tutorialspoint. It's a site known for bad C code, and many bad habits you see in questions here at SO can be traced to that site. Ok, it's anecdotal evidence, but I did ask a user here why they cast the result of malloc, and they responded that they had learned that on Tutorialspoint. You can actually see (at least) four examples in this short snippet.
They cast the result from the call to time() which is completely unnecessary and just clutters the code.
For some reason they use the variable t, which is completely useless in this example. If you read the documentation for time() you'll see that just passing NULL is perfectly adequate in this example.
Why use the variable n? For this short example it's perfectly ok with a hardcoded value. And when you use variables to avoid hardcoded values, you should declare them const and give them a much more descriptive name than n. (Ok, I realize I was a bit on the edge when writing this. Omitting const isn't that big of a deal, even if it's preferable. And "n" is a common name meaning "number of iterations". And using a variable instead of a hard coded value is in general a good thing. )
Omitted #include<time.h> which would be ok if they also omitted the rest of the includes.
Using int main() instead of int main(void).
For 5, I'd say that in most cases, this does not matter for the main function, but declaring other functions as for example int foo() with empty parenthesis instead of int foo(void) could cause problems, because they mean different things. From the C standard:
The use of function declarators with empty parentheses (not prototype-format parameter type declarators) is an obsolescent feature.
Here is a question related to that: What are the semantics of function pointers with empty parentheses in each C standard?
One could also argue about a few other things, but some people would disagree about these.
Why declare i outside the for loop? Declaring it inside have been legal since C99, which is 20 years old.
Why end the function with return 0? Omitting this is also ok since C99. You only need to have a return in main if you want to return something else than 0. Personally, in general I find "it's good practice" as a complete nonsense statement unless there are some good arguments to why it should be good practice.
These are good to remember if your goal is to maintain very old C code in environments where you don't have compilers that supports C99. But how common is that?
So if I got to rewrite the example at tutorialspoint, i'd write it like this:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(void){
srand(time(NULL));
for (int i = 0; i < 5; i++) {
printf("%d\n", rand() % 50);
}
}
Another horrible example can be found here: https://www.tutorialspoint.com/c_standard_library/c_function_gets.htm
The function gets is removed from standard C, because it's very dangerous. Yet, the site does not even mention that.
Also, they teach you to cast the result of malloc https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm which is completely unnecessary. Read why here: Do I cast the result of malloc?
And although they mention that malloc returns NULL on failure, they don't show in the examples how to properly error check it. Same goes for functions like scanf.
Today i was reading about pure function, got confused with its use:
A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.
e.g. strlen() is a pure function while rand() is an impure one.
__attribute__ ((pure)) int fun(int i)
{
return i*i;
}
int main()
{
int i=10;
printf("%d",fun(i));//outputs 100
return 0;
}
http://ideone.com/33XJU
The above program behaves in the same way as in the absence of pure declaration.
What are the benefits of declaring a function as pure[if there is no change in output]?
pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like
for (int i = 0; i < 1000; i++)
{
printf("%d", fun(10));
}
With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.
When you say a function is 'pure' you are guaranteeing that it has no externally visible side-effects (and as a comment says, if you lie, bad things can happen). Knowing that a function is 'pure' has benefits for the compiler, which can use this knowledge to do certain optimizations.
Here is what the GCC documentation says about the pure attribute:
pure
Many functions have no effects except the return value and their return
value depends only on the parameters and/or global variables.
Such a function can be subject to common subexpression elimination and
loop optimization just as an arithmetic operator would be. These
functions should be declared with the attribute pure. For example,
int square (int) __attribute__ ((pure));
Philip's answer already shows how knowing a function is 'pure' can help with loop optimizations.
Here is one for common sub-expression elimination (given foo is pure):
a = foo (99) * x + y;
b = foo (99) * x + z;
Can become:
_tmp = foo (99) * x;
a = _tmp + y;
b = _tmp + z;
In addition to possible run-time benefits, a pure function is much easier to reason about when reading code. Furthermore, it's much easier to test a pure function since you know that the return value only depends on the values of the parameters.
A non-pure function
int foo(int x, int y) // possible side-effects
is like an extension of a pure function
int bar(int x, int y) // guaranteed no side-effects
in which you have, besides the explicit function arguments x, y,
the rest of the universe (or anything your computer can communicate with) as an implicit potential input. Likewise, besides the explicit integer return value, anything your computer can write to is implicitly part of the return value.
It should be clear why it is much easier to reason about a pure function than a non-pure one.
Just as an add-on, I would like to mention that C++11 codifies things somewhat using the constexpr keyword. Example:
#include <iostream>
#include <cstring>
constexpr unsigned static_strlen(const char * str, unsigned offset = 0) {
return (*str == '\0') ? offset : static_strlen(str + 1, offset + 1);
}
constexpr const char * str = "asdfjkl;";
constexpr unsigned len = static_strlen(str); //MUST be evaluated at compile time
//so, for example, this: int arr[len]; is legal, as len is a constant.
int main() {
std::cout << len << std::endl << std::strlen(str) << std::endl;
return 0;
}
The restrictions on the usage of constexpr make it so that the function is provably pure. This way, the compiler can more aggressively optimize (just make sure you use tail recursion, please!) and evaluate the function at compile time instead of run time.
So, to answer your question, is that if you're using C++ (I know you said C, but they are related), writing a pure function in the correct style allows the compiler to do all sorts of cool things with the function :-)
In general, Pure functions has 3 advantages over impure functions that the compiler can take advantage of:
Caching
Lets say that you have pure function f that is being called 100000 times, since it is deterministic and depends only on its parameters, the compiler can calculate its value once and use it when necessary
Parallelism
Pure functions don't read or write to any shared memory, and therefore can run in separate threads without any unexpected consequence
Passing By Reference
A function f(struct t) gets its argument t by value, and on the other hand, the compiler can pass t by reference to f if it is declared as pure while guaranteeing that the value of t will not change and have performance gains
In addition to the compile time considerations, pure functions can be tested fairly easy: just call them.
No need to construct objects or mock connections to DBs / file system.
I heard (probably from a teacher) that one should declare all variables on top of the program/function, and that declaring new ones among the statements could cause problems.
But then I was reading K&R and I came across this sentence: "Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function". He follows with an example:
if (n > 0){
int i;
for (i=0;i<n;i++)
...
}
I played a bit with the concept, and it works even with arrays. For example:
int main(){
int x = 0 ;
while (x<10){
if (x>5){
int y[x];
y[0] = 10;
printf("%d %d\n",y[0],y[4]);
}
x++;
}
}
So when exactly I am not allowed to declare variables? For example, what if my variable declaration is not right after the opening brace? Like here:
int main(){
int x = 10;
x++;
printf("%d\n",x);
int z = 6;
printf("%d\n",z);
}
Could this cause trouble depending on the program/machine?
I also often hear that putting variables at the top of the function is the best way to do things, but I strongly disagree. I prefer to confine variables to the smallest scope possible so they have less chance to be misused and so I have less stuff filling up my mental space in each line on the program.
While all versions of C allow lexical block scope, where you can declare the variables depends of the version of the C standard that you are targeting:
C99 onwards or C++
Modern C compilers such as gcc and clang support the C99 and C11 standards, which allow you to declare a variable anywhere a statement could go. The variable's scope starts from the point of the declaration to the end of the block (next closing brace).
if( x < 10 ){
printf("%d", 17); // z is not in scope in this line
int z = 42;
printf("%d", z); // z is in scope in this line
}
You can also declare variables inside for loop initializers. The variable will only exist only inside the loop.
for(int i=0; i<10; i++){
printf("%d", i);
}
ANSI C (C90)
If you are targeting the older ANSI C standard, then you are limited to declaring variables immediately after an opening brace1.
This doesn't mean you have to declare all your variables at the top of your functions though. In C you can put a brace-delimited block anywhere a statement could go (not just after things like if or for) and you can use this to introduce new variable scopes. The following is the ANSI C version of the previous C99 examples:
if( x < 10 ){
printf("%d", 17); // z is not in scope in this line
{
int z = 42;
printf("%d", z); // z is in scope in this line
}
}
{int i; for(i=0; i<10; i++){
printf("%d", i);
}}
1 Note that if you are using gcc you need to pass the --pedantic flag to make it actually enforce the C90 standard and complain that the variables are declared in the wrong place. If you just use -std=c90 it makes gcc accept a superset of C90 which also allows the more flexible C99 variable declarations.
missingno covers what ANSI C allows, but he doesn't address why your teachers told you to declare your variables at the top of your functions. Declaring variables in odd places can make your code harder to read, and that can cause bugs.
Take the following code as an example.
#include <stdio.h>
int main() {
int i, j;
i = 20;
j = 30;
printf("(1) i: %d, j: %d\n", i, j);
{
int i;
i = 88;
j = 99;
printf("(2) i: %d, j: %d\n", i, j);
}
printf("(3) i: %d, j: %d\n", i, j);
return 0;
}
As you can see, I've declared i twice. Well, to be more precise, I've declared two variables, both with the name i. You might think this would cause an error, but it doesn't, because the two i variables are in different scopes. You can see this more clearly when you look at the output of this function.
(1) i: 20, j: 30
(2) i: 88, j: 99
(3) i: 20, j: 99
First, we assign 20 and 30 to i and j respectively. Then, inside the curly braces, we assign 88 and 99. So, why then does the j keep its value, but i goes back to being 20 again? It's because of the two different i variables.
Between the inner set of curly braces the i variable with the value 20 is hidden and inaccessible, but since we have not declared a new j, we are still using the j from the outer scope. When we leave the inner set of curly braces, the i holding the value 88 goes away, and we again have access to the i with the value 20.
Sometimes this behavior is a good thing, other times, maybe not, but it should be clear that if you use this feature of C indiscriminately, you can really make your code confusing and hard to understand.
If your compiler allows it then its fine to declare anywhere you want. In fact the code is more readable (IMHO) when you declare the variable where you use instead of at the top of a function because it makes it easier to spot errors e.g. forgetting to initialize the variable or accidently hiding the variable.
A post shows the following code:
//C99
printf("%d", 17);
int z=42;
printf("%d", z);
//ANSI C
printf("%d", 17);
{
int z=42;
printf("%d", z);
}
and I think the implication is that these are equivalent. They are not. If int z is placed at the bottom of this code snippet, it causes a redefinition error against the first z definition but not against the second.
However, multiple lines of:
//C99
for(int i=0; i<10; i++){}
does work. Showing the subtlety of this C99 rule.
Personally, I passionately shun this C99 feature.
The argument that it narrows the scope of a variable is false, as shown by these examples. Under the new rule, you cannot safely declare a variable until you have scanned the entire block, whereas formerly you only needed to understand what was going on at the head of each block.
Internally all variables local to a function are allocated on a stack or inside CPU registers, and then the generated machine code swaps between the registers and the stack (called register spill), if compiler is bad or if CPU doesn't have enough registers to keep all the balls juggling in the air.
To allocate stuff on stack, CPU has two special registers, one called Stack Pointer (SP) and another -- Base Pointer (BP) or frame pointer (meaning the stack frame local to the current function scope). SP points inside the current location on a stack, while BP points to the working dataset (above it) and the function arguments (below it). When function is invoked, it pushes the BP of the caller/parent function onto the stack (pointed by SP), and sets the current SP as the new BP, then increases SP by the number of bytes spilled from registers onto stack, does computation, and on return, it restores its parent's BP, by poping it from the stack.
Generally, keeping your variables inside their own {}-scope could speedup compilation and improve the generated code by reducing the size of the graph the compiler has to walk to determine which variables are used where and how. In some cases (especially when goto is involved) compiler can miss the fact the variable wont be used anymore, unless you explicitly tell compiler its use scope. Compilers could have time/depth limit to search the program graph.
Compiler could place variables declared near each other to the same stack area, which means loading one will preload all other into cache. Same way, declaring variable register, could give compiler a hint that you want to avoid said variable being spilled on stack at all costs.
Strict C99 standard requires explicit { before declarations, while extensions introduced by C++ and GCC allow declaring vars further into the body, which complicates goto and case statements. C++ further allows declaring stuff inside for loop initialization, which is limited to the scope of the loop.
Last but not least, for another human being reading your code, it would be overwhelming when he sees the top of a function littered with half a hundred variables declarations, instead of them localized at their use places. It also makes easier to comment out their use.
TLDR: using {} to explicitly state variables scope can help both compiler and human reader.
With clang and gcc, I encountered major issues with the following.
gcc version 8.2.1 20181011
clang version 6.0.1
{
char f1[]="This_is_part1 This_is_part2";
char f2[64]; char f3[64];
sscanf(f1,"%s %s",f2,f3); //split part1 to f2, part2 to f3
}
neither compiler liked f1,f2 or f3, to be within the block. I had to relocate f1,f2,f3 to the function definition area.
the compiler did not mind the definition of an integer with the block.
As per the The C Programming Language By K&R -
In C, all variables must be declared before they are used, usually at the
beginning of the function before any executable statements.
Here you can see word usually it is not must..
Folks I think I will throw all my modest C lore away. Look at this code:
int main(int argc, char** argv, char** envp)
{
int aa;
srand(time(NULL));
int Num = rand()%20;
int Vetor[Num];
for (aa = 0; aa < Num; aa++)
{
Vetor[aa] = rand()%40;
printf("Vetor [%d] = %d\n", aa, Vetor[aa]);
}
}
I would think that this should throw an error for two reasons - first that I am declaring both Num and Vetor after executing a command (srand), second because I am declaring Vetor based on Num, this should not be possible right? because those array sizes should not be decided at runtime but at compile time right?
I am really surprised that his works and if you guys could explain why I can actually use stuff like this would be great.
This is using GCC.
These are C99 features, and it seems your compiler supports them. That's all ;)
From Wikipedia:
C99 introduced several new features, many of which had already been implemented as extensions in several compilers:
inline functions
intermingled declarations and code, variable declaration no longer
restricted to file scope or the start
of a compound statement (block)
several new data types, including long long int, optional extended
integer types, an explicit boolean
data type, and a complex type to
represent complex numbers
variable-length arrays
support for one-line comments beginning with //, as in BCPL or C++
new library functions, such as snprintf
etc (more)
C99 supports declarations anywhere in the code, as well as VLAs. What compiler are you using?
A coding style presentation that I attended lately in office advocated that variables should NOT be assigned (to a default value) when they are defined. Instead, they should be assigned a default value just before their use.
So, something like
int a = 0;
should be frowned upon.
Obviously, an example of 'int' is simplistic but the same follows for other types also like pointers etc.
Further, it was also mentioned that the C99 compatible compilers now throw up a warning in the above mentioned case.
The above approach looks useful to me only for structures i.e. you memset them only before use. This would be efficient if the structure is used (or filled) only in an error leg.
For all other cases, I find defining and assigning to a default value a prudent exercise as I have encountered a lot of bugs because of un-initialized pointers both while writing and maintaining code. Further, I believe C++ via constructors also advocates the same approach i.e. define and assign.
I am wondering why(if) C99 standard does not like defining & assigning. Is their any considerable merit in doing what the coding style presentation advocated?
Usually I'd recommend initialising variables when they are defined if the value they should have is known, and leave variables uninitialised if the value isn't. Either way, put them as close to their use as scoping rules allow.
Instead, they should be assigned a default value just before their use.
Usually you shouldn't use a default value at all. In C99 you can mix code and declarations, so there's no point defining the variable before you assign a value to it. If you know the value it's supposed to take, then there is no point in having a default value.
Further, it was also mentioned that the C99 compatible compilers now throw up a warning in the above mentioned case.
Not for the case you show - you don't get a warning for having int x = 0;. I strongly suspect that someone got this mixed up. Compilers warn if you use a variable without assigning a value to it, and if you have:
... some code ...
int x;
if ( a )
x = 1;
else if ( b )
x = 2;
// oops, forgot the last case else x = 3;
return x * y;
then you will get a warning that x may be used without being initialised, at least with gcc.
You won't get a warning if you assign a value to x before the if, but it is irrelevant whether the assignment is done as an initialiser or as a separate statement.
Unless you have a particular reason to assign the value twice for two of the branches, there's no point assigning the default value to x first, as it stops the compiler warning you that you've covered every branch.
There's no such requirement (or even guideline that I'm aware of) in C99, nor does the compiler warn you about it. It's simply a matter of style.
As far as coding style is concerned, I think you took things too literally. For example, your statement is right in the following case...
int i = 0;
for (; i < n; i++)
do_something(i);
... or even in ...
int i = 1;
[some code follows here]
while (i < a)
do_something(i);
... but there are other cases that, in my mind, are better handled with an early "declare and assign". Consider structures constructed on the stack or various OOP constructs, like in:
struct foo {
int bar;
void *private;
};
int my_callback(struct foo *foo)
{
struct my_struct *my_struct = foo->private;
[do something with my_struct]
return 0;
}
Or like in (C99 struct initializers):
void do_something(int a, int b, int c)
{
struct foo foo = {
.a = a,
.b = b + 1,
.c = c / 2,
};
write_foo(&foo);
}
I sort of concur with the advice, even though I'm not altogether sure the standard says anything about it, and I very much doubt the bit about compiler warnings is true.
The thing is, modern compilers can and do detect the use of uninitialised variables. If you set your variables to default values at initialisation, you lose that detection. And default values can cause bugs too; certainly in the case of your example, int a = 0;. Who says 0 is an appropriate value for a?
In the 1990s, the advice would've been wrong. Nowadays, it's correct.
I find it highly useful to pre-assign some default data to variables so that i don't have to do (as many) null checks in code.
I have seen so many bugs due to uninitialized pointers that I always advocated to declare each variable with NULL_PTR and each primitivewith some invalid/default value.
Since I work on RTOS and high performance but low resource systems, it is possible that the compilers we use do not catch non-initialized usage. Though I doubt modern compilers can also be relied on 100%.
In large projects where Macro's are extensively used, I have seen rare scenarios where even Kloclwork /Purify have failed to find non-initialized usage.
So I say stick with it as long as you are using plain old C/C++.
Modern languages like .Net can guarantee to initialize varaibles, or give a compiler error for uninitialized variable usage. Following link does a performance analysis and validates that there is a 10-20% performance hit for .NET. The analysis is in quite detail and is explained well.
http://www.codeproject.com/KB/dotnet/DontInitializeVariables.aspx