I was taught that variables within scope are destroyed (freed/de-allocated?) at the instruction generated by the "}" at the end of a scope body. I was about to teach someone the same thing, but I decided to see for myself, what will happen if I change a local variable through a pointer like this:
int main ()
{
int* p = NULL;
if(1)
{
int localvar = 1;
p = &localvar;
}
(*p) = 345;
printf("%i\n", *p);
return 0;
}
Compiled with MinGW-GCC and "-Wall -g -pedantic -w -Wfatal-errors -Wextra" (even though -Wall overrides most flags)
It was very surprising to me that not only a warning was generated, but also no runtime exception was thrown. Instead, everything seems to work just as if I am accessing a global variable.
Can this be explained in a definitive manner to avoid any future misconceptions?
C compilers aren't required to generate an error when you do something like this. That's part of what makes C fast.
That also means that if you do something you're not supposed to do, you can trigger undefined behavior. This essentially means that no guarantees can be made regarding what the program will do. It could crash, it could output strange results, or it could appear to be work properly as in your case.
How UB manifests itself can change by making a seemingly unrelated change, such as adding an unused local variable or a call to printf for debugging.
In this particular case the memory in question hasn't been reserved for some other use so it is still in the valid memory space for the program and hasn't yet been overwritten. But again, you can't rely on that behavior.
Related
Say you have (for reasons that are not important here) the following code:
int k = 0;
... /* no change to k can happen here */
if (k) {
do_something();
}
Using the -O2 flag, GCC will not generate any code for it, recognizing that the if test is always false.
I'm wondering if this is a pretty common behaviour across compilers or it is something I should not rely on.
Does anybody knows?
Dead code elimination in this case is trivial to do for any modern optimizing compiler. I would definitely rely on it, given that optimizations are turned on and you are absolutely sure that the compiler can prove that the value is zero at the moment of check.
However, you should be aware that sometimes your code has more potential side effects than you think.
The first source of problems is calling non-inlined functions. Whenever you call a function which is not inlined (i.e. because its definition is located in another translation unit), compiler assumes that all global variables and the whole contents of the heap may change inside this call. Local variables are the lucky exception, because compiler knows that it is illegal to modify them indirectly... unless you save the address of a local variable somewhere. For instance, in this case dead code won't be eliminated:
int function_with_unpredictable_side_effects(const int &x);
void doit() {
int k = 0;
function_with_unpredictable_side_effects(k);
if (k)
printf("Never reached\n");
}
So compiler has to do some work and may fail even for local variables. By the way, I believe the problem which is solved in this case is called escape analysis.
The second source of problems is pointer aliasing: compiler has to take into account that all sort of pointers and references in your code may be equal, so changing something via one pointer may change the contents at the other one. Here is one example:
struct MyArray {
int num;
int arr[100];
};
void doit(int idx) {
MyArray x;
x.num = 0;
x.arr[idx] = 7;
if (x.num)
printf("Never reached\n");
}
Visual C++ compiler does not eliminate the dead code, because it thinks that you may access x.num as x.arr[-1]. It may sound like an awful thing to do to you, but this compiler has been used in gamedev area for years, and such hacks are not uncommon there, so the compiler stays on the safe side. On the other hand, GCC removes the dead code. Maybe it is related to its exploitation of strict pointer aliasing rule.
P.S. The const keywork is never used by optimizer, it is only present in C/C++ language for programmers' convenience.
There is no pretty common behaviour across compilers. But there is a way to explore how different compilers acts with specific part of code.
Compiler explorer will help you to answer on every question about code generation, but of course you must be familiar with assembler language.
I am not sure why strcat works in this case for me:
char* foo="foo";
printf(strcat(foo,"bar"));
It successfully prints "foobar" for me.
However, as per an earlier topic discussed on stackoverflow here: I just can't figure out strcat
It says, that the above should not work because foo is declared as a string literal. Instead, it needs to be declared as a buffer (an array of a predetermined size so that it can accommodate another string which we are trying to concatenate).
In that case, why does the above program work for me successfully?
This code invokes Undefined Behavior (UB), meaning that you have no guarantee of what will happen (failure here).
The reason is that string literals are immutable. That means that they are not mutable, and any attempt of doing so, will invoke UB.
Note what a difficult logical error(s) can arise with UB, since it might work (today and in your system), but it's still wrong, which makes it very likely that you might miss the error, and get along as everything was fine.
PS: In this Live Demo, I am lucky enough to get a Segmentation fault. I say lucky, because this seg fault will make me investigate and debug the code.
It's worth noting that GCC issues no warning, and the warning from Clang are also irrelevant:
p
rog.c:7:8: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
printf(strcat(foo,"bar"));
^~~~~~~~~~~~~~~~~
prog.c:7:8: note: treat the string as an argument to avoid this
printf(strcat(foo,"bar"));
^
"%s",
1 warning generated.
String literals are immutable in the sense that the compiler will operate under the assumption that you won't mutate them, not that you'll necessarily get an error if you try to modify them. In legalese, this is "undefined behavior", so anything can happen, and, as far as the standard is concerned, it's fine.
Now, on modern platforms and with modern compilers you do have extra protections: on platforms that have memory protection the string table generally gets placed in a read-only memory area, so that modifying it will get you a runtime error.
Still, you may have a compiler that doesn't provide any of the runtime-enforced checks, either because you are compiling for a platform without memory protection (e.g. pre-80386 x86, so pretty much any C compiler for DOS such as Turbo C, most microcontrollers when operating on RAM and not on flash, ...), or with an older compiler which doesn't exploit this hardware capability by default to remain compatible with older revisions (older VC++ for a long time), or with a modern compiler which has such an option explicitly enabled, again for compatibility with older code (e.g. gcc with -fwritable-strings). In all these cases, it's normal that you won't get any runtime error.
Finally, there's an extra devious corner case: current-day optimizers actively exploit undefined behavior - i.e. they assume that it will never happen, and modify the code accordingly. It's not impossible that a particularly smart compiler can generate code that just drops such a write, as it's legally allowed to do anything it likes most for such a case.
This can be seen for some simple code, such as:
int foo() {
char *bar = "bar";
*bar = 'a';
if(*bar=='b') return 1;
return 0;
}
here, with optimizations enabled:
VC++ sees that the write is used just for the condition that immediately follows, so it simplifies the whole thing to return 0; no memory write, no segfault, it "appears to work" (https://godbolt.org/g/cKqYU1);
gcc 4.1.2 "knows" that literals don't change; the write is redundant and it gets optimized away (so, no segfault), the whole thing becomes return 1 (https://godbolt.org/g/ejbqDm);
any more modern gcc choose a more schizophrenic route: the write is not elided (so you get a segfault with the default linker options), but if it succeeded (e.g. if you manually fiddle with memory protection) you'd get a return 1 (https://godbolt.org/g/rnUDYr) - so, memory modified but the code that follows thinks it hasn't been modified; this is particularly egregious on AVR, where there's no memory protection and the write succeeds.
clang does pretty much the same as gcc.
Long story short: don't try your luck and tread carefully. Always assign string literals to const char * (not plain char *) and let the type system help you avoid this kind of problems.
After reading the following question, I understand that there no such thing exist (at least not 'portable').
However I am starring at the following piece of code from mono code base, which return a pointer to the stack:
static void *
return_stack_ptr ()
{
gpointer i;
return &i;
}
I am surprised that the above code can even work on arch such as PowerPC, I would have assumed this would only work on x86 (and maybe only gcc).
Is this going to work on PowerPC ?
The purpose of the stack is supporting function calls and local variables. If your system has a stack, it's going to use it, and allocate the local variable there. So it's very reasonable to assume that the address of the local variable points somewhere in the stack. This is not specific to x86 or gcc - it's a fairly general idea.
However, using a pointer to a variable that doesn't exist (i.e. after it goes out of scope) is Undefined Behavior. So this function cannot be guaranteed to do anything meaningful. In fact, a "clever" compiler could detect that your program uses undefined behavior, and replace your code by a no-op (and call it a "performance optimization").
Alternatively, a "wise" compiler could recognize that your function returns a pointer to the stack, and inline it by using a hardware stack pointer instead.
Neither option is guaranteed - this code is not portable.
The following file foo.c is a simplified version of a subtler bug I found in my code.
int b;
void bar(int a);
void foo(int a)
{
bar(a);
a = 42;
}
The line a = 42 is in fact a typo in my code: I meant b = 42. I don't expect the compiler to detect that I made a typo, but I would like the get a warning that I am assigning to a local variable (or a function parameter) that is not going to be used anymore. If I compile this file with
% gcc-4.6 -Wall -Wextra -pedantic -O3 -c foo.c
I get absolutely no warning. Inspecting the generated code shows that the assignment a = 42 is not performed, so gcc is perfectly well aware that this instruction is useless (hence potentially bogus). Commenting the call to bar(a); does produce a warning warning: parameter ‘a’ set but not used [-Wunused-but-set-parameter], so it seems like gcc will not warn as long as a is used somewhere in the function, even if it is before the assignment.
My questions:
Is there a way to tell GCC or Clang to produce a warning for such case? (I could not get clang 3.0 to produce any warning, even with the call to bar removed.)
Is there a reason for the actual behavior? I.e, some cases were it is actually desirable to assign to local variables that will be thrown away by the optimizer?
There is no gcc or clang option to my knowledge that can warn about this useless assignment.
PC-Lint on the other hand is able to warn in this situation.
Warning 438 Last value assigned to variable 'Symbol' not used -- A value had
been assigned to a variable that was not subsequently used. The
message is issued either at a return statement or at the end of a
block when the variable goes out of scope.
The compiler will detect that this is dead code and optimise it out anyway. If you look at the assembly listing (provided you tell gcc to optimise), then you will find the assignment is non-existent.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
e.g. I'm usually compulsive obsessive and like to do
static int i = 0;
if (!i) i = var;
but
static int i;
if (!i) i = var;
would also work.
Why? Why can't it segfault so we can all be happy that undefined variables are evil and be concise about it?
Not even the compilers complain:(
This 'philosophy' of indecisiveness in C has made me do errors such as this:
strcat(<uninitialized>, <proper_string>)) //wrong!!1
strcpy(<uninitialized>, <proper_string>)) //nice
In your example, i is not an undefined variable, it is an uninitialized variable. And C has good reasons for not producing an error in these cases. For instance, a variable may be uninitialized when it is defined but assigned a value before it is used, so it is not a semantic error to lack an initialization in the definition statement.
Not all uses of uninitialized variables can be checked at compile-time. You could suggest that the program check every access to every variable by performing a runtime check, but that requires incurring a runtime overhead for something that is not necessary if the programmer wrote the code correctly. That's against the philosophy of C. A similar argument applies to why automatically-allocated variables aren't initialized by default.
However, in cases where the use of a variable before being initialized can be detected at compile-time, most modern compilers will emit a warning about it, if you have your warning level turned up high enough (which you always should). So even though the standard does not require it, it's easy to get a helpful diagnostic about this sort of thing.
Edit: Your edit to your question makes it make no sense. If i is declared to be static then it is initialized -- to zero.
This comes from C's "lightweight" and "concise" roots. Default initializing to zero bytes was free (for global variables). And why specify anything in source text when you know what the compiler is going to do?
Uninitialized auto variables can contain random data, and in that case your "if" statements are not only odd but don't reliably do what you desire.
It seems you don't understand something about C.
int i;
actually DOES define the variable in addition to declaring it. There is memory storage. There is just no initialization when in function scope.
int i=0;
declares, defines, and initializes the storage to 0.
if (!i)
is completely unnecessary before assigning a value to i. All it does is test the value of integer i (which may or may not be initialized to a specific value depending on which statement above you used).
It would only be useful if you did:
int *i = malloc(sizeof int);
because then i would be a pointer you are checking for validity.
You said:
Why? Why can't it segfault so we can all be happy that undefined variables are evil and be concise about it?
A "segfault" or segmentation fault, is a term that is a throwback to segmented memory OSes. Segmentation was used to get around the fact that the size of the machine word was inadequate to address all of available memory. As such, it is a runtime error, not a compile time one.
C is really not that many steps up from assembly language. It just does what you tell it to do. When you define your int, a machine word's worth of memory is allocated. Period. That memory is in a particular state at runtime, whether you initialize it specifically or leave it to randomness.
It's to squeeze every last cycle out of your CPU. On a modern CPU of course not initializing a variable until the last millisecond is a totally trivial thing, but when C was designed, that was not necessarily the case.
That behavior is undefined. Stack variables are uninitialized, so your second example may work in your compiler on your platform that one time you ran it, but it probably won't in most cases.
Getting to your broader question, compile with -Wall and -pedantic and it may make you happier. Also, if you're going to be ocd about it, you may as well write if (i == 0) i = var;
p.s. Don't be ocd about it. Trust that variable initialization works or don't use it. With C99 you can declare your variables right before you use them.
C simply gives you space, but it doesn't promise to know what is in that space. It is garbage data. An automatically added check in the compiler is possible, but that extends compile times. C is a powerful language, and as such you have the power to do anything and fall on your face at the same time. If you want something, you have to explicitly ask for it. Thus is the C philosophy.
Where is var defined?
The codepad compiler for C gives me the following error:
In function 'main': Line 4: error:
'var' undeclared (first use in this
function) Line 4: error: (Each
undeclared identifier is reported only
once Line 4: error: for each function
it appears in.)
for the code:
int main(void)
{
static int i;
if (!i) i = var;
return 0;
}
If I define var as an int then the program compiles fine.
I am not really sure where your problem is. The program seems to be working fine. Segfault is not for causing your program to crash because you coded something that may be undefined in the language. The variable i is unitialized not undefined. You defined it as static int. Had you simply done:
int main(void)
{
i = var;
return 0;
}
Then it would most definately be undefined.
Your compiler should be throwing a warning because i isn't initialized to catch these sort of gotchas. It seems your if statement is sort of a catch for that warning, even if the compiler does not report it.
Static variables (in function or file scope) and global variables are always initialized to zero.
Stack variables have no dependable value. To answer your question, uninitialized stack variables are often set to non-zero, so they often evaluate as true. That cannot be depended on.
When I was running Gentoo Linux I once found a bug in some open source Unicode handling code that checked an uninitialized variable against -1 in a while loop. On 32-bit x86 with GCC this code always ran fine, because the variable was never -1. On AMD-64 with its extra registers, the variable ended up always set to -1 and the processing loop never ran.
So, always use the compiler's high warning levels when building so you can find those bugs.
That's C, you cannot do much about it. The basic purpose of C is to be fast - a default initialization of the variable takes a few more CPU cycles and therefore you have to explicitly specify that you want to spend them. Because of this (and many other pitfalls) C is not considered good for those who don't know what they are doing :)
The C standard says the behavior of uninitialized auto variables is undefined. A particular C compiler may initialize all variables to zero (or pointers to null), but since this is undefined behavior you can't rely on it being the case for a compiler, or even any version of a particular compiler. In other words, always be explicit, and undefined means just that: The behavior is not defined and may vary from implementation to implementation.
-- Edit --
As pointed out, the particular question was about static variables, which have defined initialization behavior according to the standard. Although it's still good practice to always explicitly initialize variables, this answer is only relevant to auto variables which do not have defined behavior.