C code :
int a;
printf("\n\t %d",a); // It'll print some garbage value;
So how does these garbage values are assigned to uninitialized variables behind the curtains in C?
Does it mean C first allocates memory to variable 'a' and then what ever there is at that memory location becomes value of 'a'? or something else?
Does it mean C first allocates memory to variable 'a' and then what ever there is at that memory location becomes value of 'a'?
Exactly!
Basically, C doesn't do anything you don't tell it to. That's both its strength and its weakness.
Does it mean C first allocates memory to variable 'a' and then what
ever there is at that memory location becomes value of 'a'? or
something else?
Correct. It is worth mentioning that the "allocation" of automatic variables such as int a is virtually nonexistent, since those variables are stored on the stack or in a CPU register. For variables stored on the stack, "allocation" is performed when the function is called, and boils down to an instruction that moves the stack pointer by a fixed offset calculated at compile time (the combined storage of all local variables used by the function, rounded to proper alignment).
The initial value of variables assigned to CPU registers is the previous contents of the register. Because of this difference (register vs. memory) it sometimes happens that programs that worked correctly when compiled without optimization start breaking when compiled with optimization turned on. The uninitialized variables, previously pointing to the location that happened to be zero-initialized, now contain values from previous uses of the same register.
Initially memory is having some values, those are unknown values, also called garbage values,
when ever we declare a variable some memory was reserved for the variable according to datatype we specified while declaring, so the memory initial value is unknown value, if we initialize some other value then our value will be in that memory location.
int a;
While declaring a variable, the memory is allocated. But this variable is not assigned which means the variable a is not initialized. If this variable a is only declared but no longer used in the program is called garbage value.
For example:
int a, b;
b=10;
printf("%d",b);
return 0;
Here it's only declared but no longer assigned or initialized. So this is called garbage value.
Does it mean C first allocates memory to variable 'a' and then what ever there is at that memory location becomes value of 'a'?
No, it does not mean that.
When an object is not initialized, the C standard does not provide any plan for how its value is determined. Not only that, the program is not required to behave as if the object has any fixed value. It can vary as if the memory reserved for it were not held in any fixed state, just fluctuating.
Here are the specific C 2018 rules about that:
The so-called “value” of an uninitialized object is indeterminate, per 6.2.4 6 for objects with automatic storage duration without variable length array type, 6.2.4 7 for those with variable length array type, 7.22.3.4 2 for objects allocated with malloc, 7.22.3.5 2 for additional space allocated by realloc, and 7.22.3.1 2 for aligned_alloc. Other objects, such as those with static storage duration, are initialized.
Per 3.19.2, an indeterminate value is “either an unspecified value or a trap representation”.
Per 3.19.3, an unspecified value is a “valid value of the relevant type where this document imposes no requirements on which value is chosen in any instance”.
This means that in each instance an object is used, the C standard does not impose any requirements on which value is used for it. It is not required to be the same as a previous use. The program may behave as if it does not hold any fixed value. When it is used multiple times, the program may act as if it has a different value each time. For example, the C standard would allow printf("%d %d %d\n", a, a, a); to print “34 -10200773 2147483204”.
A way this can happen is that, while attempting to compile the code int a; printf("%d %d %d\n", a, a, a);, the compiler has nowhere to get a from, because it has never been given any fixed value. So, instead of generating useless instructions to move data from uninitialized memory to where the arguments are passed, the compiler generates nothing. Then printf gets called, and the registers or stack locations where the arguments are passed contain whatever data they had from earlier. And that may well be three different values, which printf prints. So it looks to an observer of the output as if a had three different values in printf("%d %d %d\n", a, a, a);.
(In addition, using the value of an uninitialized object with automatic storage duration that has not had its address taken is explicitly undefined behavior, because 6.3.2.1 2, about converting an object to its value, says “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.” So, when there is an object meeting these criteria, using its value can break the program completely; it might not only have different values at different times, but the program might abort, go down a branch different from what you expected, not call printf when there is a printf in the source code, and so on.)
Which I have investigated years back is this:
The answer is that garbage value is the leftover of previous program.
So, when you run any program, that uses variables to store values and when it ends OS only release the memory and make it available for other programs but OS does not flush the data in those locations automatically.
So, when you declare a uninitialized variable OS assigns any available memory to that variable, and because you have not assigned any value, the leftover value on that location is not over written and becomes the GARBAGE VALUE for that variable because it is not relevant.
And this can also be proven with a small program by accessing the location of variable in previous program using pointers.
Related
[https://i.stack.imgur.com/kfU6n.png][1]
#include<stdio.h>
int main()
{
int x = 23;
int y = 24;
int z = 25;
int asar= 26;
int a[5];
int p = 15;
int q = 16;
int r = 17;
a[11]=56;
a[10]=57;
a[9]=58;
a[8]=59;
a[7]=60;
a[6]=61;
a[5]=62;
printf("\t x=%d, y=%d, z=%d,asar=%d, p=%d, q=%d, r=%d \n", x, y, z, asar, p, q, r);
return 0;
}
I have tried to cross the bound of array and which causes undefined behavior but here I found that all the values out of the bound of array are copied in a sequence that
value of highest index
11 got copied in x(first declaration)
10 got copied in y(second declaration)
9 got copied in z(third declaration)
8 got copied in asar(fourth declaration)
7 got copied in p(fifth declaration)
6 got copied in q(sixth declaration)
5 got copied in r(seventh declaration)
There are altogether 7 other variables other than a and I have crossed limit of a exactly by 7 as such (4+7=11) and I got the output as:
x=56, y=57, z=58,asar=59, p=60, q=61, r=62 is there any logic behind this or?
Don't be amazed why I considered the memory allocation in stack because there are 7 variables excluding a and exceeded 7 values are copied one after the other. At least it is true in every case for me when number of extra variables is equal to bound exceed.
Is there any logical explanation regarding this or the question is worthless?
While auto variables are almost always stored on a stack, this isn't enforced by the C standard.
For such an object [with automatic storage duration] that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.
[6.2.4 Storage durations of objects, C11]
The data structure which best fit the description is indeed a stack but it is not defined in the standard, and there are no specific order to store the variables. CPU usually require data to be aligned a certain way to access them quickly, and the compiler may (and will) reorder them to reduce unused addresses. Adding a char somewhere in the code may change the internal order of the variables.
Moreover, trying to access objects from other objects addresses is undefined behaviour. Even if you get "lucky" and get it to work the way you want, it may break later for some seemingly unrelated reasons. For example, by changing the optimization level, the compiler may delete or merge certain variables, or place them in registers that you cannot access. Exploiting undefined behaviour is never a good idea.
If you need to ensure a specific order for your data, you can use a struct, whose order is guaranteed.
As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.
[6.7.2.1 Structure and union specifiers, C99]
Accessing members through the struct object pointer is valid, but the compiler can insert padding in-between members. The offsetof() macro is necessary to correctly access them. (Related : http://www.catb.org/esr/structure-packing/)
This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 12 months ago.
In an Microprocessor it is said that the local variables are stored in stack. In my case if func1() is called by main function then the local variable (int a = 12;)will be created in stack. Once the Called Function is executed the and return back to main function the stack memory will be deleted. So the pointer address still holds (*b) the value 12. At stack if this 'a = 12' is deleted then 'b' should be a dangling pointer no?? Can anyone explain this ? If you have detailed explanation on what happens in memory when this code is being executed it would be helpful.
#include <stdio.h>
int* func1(void);
int main()
{
int* b = func1();
printf("%d\n",*b);
}
int* func1(void)
{
int a = 12;
int* b = &a;
return b;
}
The pointer is dangling. The memory may still hold the previous value, but dereferencing the pointer invokes undefined behaviour.
GCC will give you a warning about this, if you pass -Wall option.
From the C standard (6.2.4):
The lifetime of an object is the portion of program execution during
which storage is guaranteed to be reserved for it. An object exists,
has a constant address,25) and retains its last-stored value
throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes
indeterminate when the object it points to reaches the end of its
lifetime.
There are multiple layers here.
First, is the C programming language. It is a language. You say stuff in it and it has meaning. There are sentences that have meaning, but you can also construct grammatically valid sentences that are gibberish.
The code you posted, grammatically valid, is gibberish. The object a inside func1 stops existing when the function returns. *b tries to access an object that does not exists anymore. It is not defined what should happen when you access an object after its lifetime ended. You can read about undefined behavior.
Memory exists. It's not like it is disintegrated when a function returns. It's not like RAM chips are falling out of your PC. They are still there.
So your compiler will produce some machine instructions. These machine instructions will execute. Depending solely on your compiler decisions (the code is undefined behavior, compiler can do what it wants) the actual code can behavior in any way the compiler decides to. But, most probably, *b will generate machine instructions that will read the memory region where object a resided. That memory region may still hold the value 12 or it may have been overwritten somewhere between func1 returning and calling printf, resulting in reading some other value.
At stack if this 'a = 12' is deleted then 'b' should be a dangling pointer no?
Yes.
what happens in memory when this code
This depends on the actual machine instructions generated by the compiler. Compile your code and inspect the assembly.
Strictly speaking, the behaviour is undefined, but repeat the printf or otherwise inspect (in your debugger for example) *b after the printf. It is likely that it will no longer be 12.
It is not accurate to say that the stack memory is "deleted". Rather the stack pointer is restored to the address it had before the call was invoked. The memory is otherwise untouched but becomes available for use by subsequent calls (or for passing arguments to such calls), so after calling printf it is likely to have been reused and modified.
So yes, the pointer is "dangling" since it points to memory that is no longer "valid" in the sense that it does not belong to the context in which the pointer exists.
This question already has answers here:
Is un-initialized integer always default to 0 in c?
(4 answers)
Closed 1 year ago.
In C, we know that without initializing a variable, it holds a garbage value. Yet in online compilers and also, in an IDE, when I tried this program it got compiled and there was a perfect output. When I tried to print the same without the while loop, it returned a garbage value. So, is not initializing fine?
#include <stdio.h>
int main() {
int j;
while(j<=10){
printf("\n %d",j);
j=j+1;
}
}
Try disassembling your code, for example "gcc -S test.c", so you can see that there is no instruction dedicated to initializing the integer "j" with or without loop. Greetings.
… it holds a garbage value.
When anybody says an object has a garbage value, they are being imprecise with language.
There is no value that is a garbage value. For 32-bit two’s complement integers, there are the values −2,147,483,648 to +2,147,483,647. Each of them is a valid value. None of them is a garbage value.
What it actually means to say something has a garbage value, if the speaker understands C semantics, is that the value of the object is uncontrolled. It has not been set to any specific value, and therefore whatever value you get from using it is a happenstance of circumstances. It may be some value that was in the memory of the object before it was reserved to be the memory for that object.
However, it might be other things. When an object is uninitialized, the C standard not only has no requirement that the memory of the object have any particular value, it has no requirement that the object behave as if it had any fixed value at all. This frees the compiler for purposes that are useful for optimization in other situations. But it means that, if you have int x; printf("%d\n", x); int y = x+3; printf("%d\n", y);, the compiler does not have to load x from memory each time it is used. Because x is uninitialized, the compiler is not required to do any work to load it from memory. For the printf("%d\n", x);, the compiler might let x be whatever value is in the register that would be used to pass the second argument to printf. For the int y = x+3;, the compiler might let x be whatever is in some other register that it would use to hold the value of x if x were defined. This could be a different register. So printf("%d\n", x); might print “47” while int y = x+3; printf("%d\n", y); prints “−372”, not “50”.
Sometimes an uninitialized object might behave as if it started with the value zero. This is not uncommon in short programs such as the one in the question, where nothing has used the stack much yet, so the part of it reserved for j is still in the initial state the program loader put it in, filled with zeros. But that is happenstance. When you change the program, the compiler might use a different part of the stack for j, and that part might not have zeros in it, because it was used by some of the initial program start-up code that runs before main starts.
Always avoid use a uninitialized variables, because it cause undefined behavior.
Uninitialized variables
Unlike some programming languages, C/C++ does not initialize most variables to a given value (such as zero) automatically. Thus when a variable is assigned a memory location by the compiler, the default value of that variable is whatever (garbage) value happens to already be in that memory location! A variable that has not been given a known value (usually through initialization or assignment) is called an uninitialized variable.
For your reference:
https://wiki.sei.cmu.edu/confluence/display/c/EXP33-C.+Do+not+read+uninitialized+memory
https://www.learncpp.com/cpp-tutorial/uninitialized-variables-and-undefined-behavior/
int *p;
{
int x = 0;
p = &x;
}
// p is no longer valid
{
int x = 0;
if (&x == p) {
*p = 2; // Is this valid?
}
}
Accessing a pointer after the thing it points to has been freed is undefined behavior, but what happens if some later allocation happens in the same area, and you explicitly compare the old pointer to a pointer to the new thing? Would it have mattered if I cast &x and p to uintptr_t before comparing them?
(I know it's not guaranteed that the two x variables occupy the same spot. I have no reason to do this, but I can imagine, say, an algorithm where you intersect a set of pointers that might have been freed with a set of definitely valid pointers, removing the invalid pointers in the process. If a previously-invalidated pointer is equal to a known good pointer, I'm curious what would happen.)
By my understanding of the standard (6.2.4. (2))
The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
you have undefined behaviour when you compare
if (&x == p) {
as that meets these points listed in Annex J.2:
— The value of a pointer to an object whose lifetime has ended is used (6.2.4).
— The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).
Okay, this seems to be interpreted as a two- make that three part question by some people.
First, there were concerns if using the pointer for a comparison is defined at all.
As is pointed out in the comments, the mere use of the pointer is UB, since $J.2: says use of pointer to object whose lifetime has ended is UB.
However, if that obstacle is passed (which is well in the range of UB, it can work after all and will on many platforms), here is what I found about the other concerns:
Given the pointers do compare equal, the code is valid:
C Standard, §6.5.3.2,4:
[...] If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Although a footnote at that location explicitly says. that the address of an object after the end of its lifetime is an invalid pointer value, this does not apply here, since the if makes sure the pointer's value is the address of x and thus is valid.
C++ Standard, §3.9.2,3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.
Emphasis is mine.
It will probably work with most of the compilers but it still is undefined behavior. For the C language these x are two different objects, one has ended its lifetime, so you have UB.
More seriously, some compilers may decide to fool you in a different way than you expect.
The C standard says
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow
the first array object in the address space.
Note in particular the phrase "both are pointers to the same object". In the sense of the standard the two "x"s are not the same object. They may happen to be realized in the same memory location, but this is to the discretion of the compiler. Since they are clearly two distinct objects, declared in different scopes the comparison should in fact never be true. So an optimizer might well cut away that branch completely.
Another aspect that has not yet been discussed of all that is that the validity of this depends on the "lifetime" of the objects and not the scope. If you'd add a possible jump into that scope
{
int x = 0;
p = &x;
BLURB: ;
}
...
if (...)
...
if (something) goto BLURB;
the lifetime would extend as long as the scope of the first x is reachable. Then everything is valid behavior, but still your test would always be false, and optimized out by a decent compiler.
From all that you see that you better leave it at argument for UB, and don't play such games in real code.
It would work, if by work you use a very liberal definition, roughly equivalent to that it would not crash.
However, it is a bad idea. I cannot imagine a single reason why it is easier to cross your fingers and hope that the two local variables are stored in the same memory address than it is to write p=&x again. If this is just an academic question, then yes it's valid C - but whether the if statement is true or not is not guaranteed to be consistent across platforms or even different programs.
Edit: To be clear, the undefined behavior is whether &x == p in the second block. The value of p will not change, it's still a pointer to that address, that address just doesn't belong to you anymore. Now the compiler might (probably will) put the second x at that same address (assuming there isn't any other intervening code). If that happens to be true, it's perfectly legal to dereference p just as you would &x, as long as it's type is a pointer to an int or something smaller. Just like it's legal to say p = 0x00000042; if (p == &x) {*p = whatever;}.
The behaviour is undefined. However, your question reminds me of another case where a somewhat similar concept was being employed. In the case alluded, there were these threads which would get different amounts of cpu times because of their priorities. So, thread 1 would get a little more time because thread 2 was waiting for I/O or something. Once its job was done, thread 1 would write values to the memory for the thread two to consume. This is not "sharing" the memory in a controlled way. It would write to the calling stack itself. Where variables in thread 2 would be allocated memory. Now, when thread 2 eventually got round to execution,all its declared variables would never have to be assigned values because the locations they were occupying had valid values. I don't know what they did if something went wrong in the process but this is one of the most hellish optimizations in C code I have ever witnessed.
Winner #2 in this undefined behavior contest is rather similar to your code:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
According to the post:
Using a recent version of Clang (r160635 for x86-64 on Linux):
$ clang -O realloc.c ; ./a.out
1 2
This can only be explained if the Clang developers consider that this example, and yours, exhibit undefined behavior.
Put aside the fact if it is valid (and I'm convinced now that it's not, see Arne Mertz's answer) I still think that it's academic.
The algorithm you are thinking of would not produce very useful results, as you could only compare two pointers, but you have no chance to determine if these pointers point to the same kind of object or to something completely different. A pointer to a struct could now be the address of a single char for example.
As we know, local variables have local scope and lifetime. Consider the following code:
int* abc()
{
int m;
return(&m);
}
void main()
{
int* p=abc();
*p=32;
}
This gives me a warning that a function returns the address of a local variable.
I see this as justification:
Local veriable m is deallocated once abc() completes. So we are dereferencing an invalid memory location in the main function.
However, consider the following code:
int* abc()
{
int m;
return(&m);
int p=9;
}
void main()
{
int* p=abc();
*p=32;
}
Here I am getting the same warning. But I guess that m will still retain its lifetime when returning. What is happening? Please explain the error. Is my justification wrong?
First, notice that int p=9; will never be reached, so your two versions are functionally identical. The program will allocate memory for m and return the address of that memory; any code below the return statement is unreacheable.
Second, the local variable m is not actually de-allocated after the function returns. Rather, the program considers the memory free space. That space might be used for another purpose, or it might stay unused and forever hold its old value. Because you have no guarantee about what happens to the memory once the abc() function exits, you should not attempt to access or modify it in any way.
As soon as return keyword is encountered, control passes back to the caller and the called function goes out of scope. Hence, all local variables are popped off the stack. So the last statement in your second example is inconsequential and the warning is justified
Logically, m no longer exists when you return from the function, and any reference to it is invalid once the function exits.
Physically, the picture is a bit more complicated. The memory cells that m occupied are certainly still there, and if you access those cells before anything else has a chance to write to them, they'll contain the value that was written to them in the function, so under the right circumstances it's possible for you to read what was stored in m through p after abc has returned. Do not rely on this behavior being repeatable; it is a coding error.
From the language standard (C99):
6.2.4 Storage durations of objects
...
2 The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,25) and retains
its last-stored value throughout its lifetime.26) If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime.
25) The term ‘‘constant address’’ means that two pointers to the object constructed at possibly different
times will compare equal. The address may be different during two different executions of the same
program.
26) In the case of a volatile object, the last store need not be explicit in the program.
Emphasis mine. Basically, you're doing something that the language definition explicitly calls out as undefined behavior, meaning the compiler is free to handle that situation any way it wants to. It can issue a diagnostic (which your compiler is doing), it can translate the code without issuing a diagnostic, it can halt translation at that point, etc.
The only way you can make m still valid memory (keeping the maximum resemblance with your code) when you exit the function, is to prepend it with the static keyword
int* abc()
{
static int m;
m = 42;
return &m;
}
Anything after a return is a "dead branch" that won't be ever executed.
int m should be locally visible. You should create it as int* m and return it directly.