I am not clear for how long a variable is guaranteed to be allocated in C.
For example, if I have:
void foo(void) {
int x;
int* y = &x;
...
}
Is the space allocated on the stack for x guaranteed to be reserved for this variable exclusively for the entire duration of foo()? Said differently, is y guaranteed to point to a location that will be preserved for the entire duration of foo, or could the compiler decide that since x isn't being used, the stack space can be used for another use within foo and therefore *y may change without accessing y (or x) directly?
When you ask questions like this, you should be clear whether you are asking about C semantics or about program implementation.
C semantics are described using a model of an abstract computer in which all operations are performed as the C standard describes them. When a compiler compiles a program, it can change how the program is implemented as long as it gets the same results. (The results that must be correct are the observable behavior of the program: its output, including data written to files, its input/output interactions, and its accesses to volatile objects.)
In the abstract computer, memory for x is reserved from the time an execution of foo starts until that execution of foo ends.1, 2
So, in the abstract computer, it does not matter if x is used or not; memory is reserved for it until foo returns or its execution is ended in some other way (such as a longjmp or program termination).
When the compiler implements this program, it is allowed optimize away x completely (if it and its address are not used in any way that requires the memory to be reserved) or to use the same memory for x that it uses for other things, as long as the uses do not conflict in ways that change the observable behavior. For example, if we have this code:
int x;
int *y = &x;
x = 3;
printf("%d\n", x);
int b = 4;
printf("%d\n", b);
then the compiler may use the same memory for b that it uses for x.
On the other hand, if we have this code:
int x;
int *y = x;
printf("%p\n", (void *) y);
int b = 4;
printf("%p\n", (void *) &b);
then the program must print different values for the two printf statements. This is because different objects that both exist at the same moment in the abstract computer model must have different addresses. The abstract computer would print different addresses for these, so the compiler must generate a program that is faithful to that model.
Footnotes
1 There can be multiple executions of a function live at one time, due to nested function calls.
2 Sometimes people say the lifetime of x is the scope of the function, but this is incorrect. The function could call another routine and pass it y, which has the address of x. Then the other routine can access x using this address. The memory is still reserved for x even though it is not in the scope of the other routine’s source code. During the subroutine call, the execution of foo is temporarily suspended, but it is not ended, so the lifetime of x has not ended.
The lifetime of an automatic variable is the entire duration of the scope in which it is declared; in your case, that scope is the whole of the foo function.
Compilers are allowed to make optimizations (including removing variables completely) that can have no possible observable effect; however, once you assign the address of x to y, then any use of *y will be using x, so the memory allocated for x cannot then be used for something else, all the time there is a possibility of accessing or modifying *y.
x is being used, y is being passed the address of it! In short the answer is "yes" as long as the compiler author(s) is(are) sensible. Most compilers ( visual studio at least ) wouldn't compile this or at least warn that x is uninitialized so this isn't a very realistic example.
Y most definitely cannot change by changing another variable than x or y. that's 100%. When you go into a function the parameters then the local variables are pushed onto the stack and then when you come out of a function they are popped off. There is no scope for shared memory (unless you are using a union).
Whats the reason behind this question? If you really want to know how c is defined you should read "The C Programming Language" by Kernighan and Ritchie"
Related
This question already has answers here:
Is un-initialized integer always default to 0 in c?
(4 answers)
Closed 1 year ago.
In C, we know that without initializing a variable, it holds a garbage value. Yet in online compilers and also, in an IDE, when I tried this program it got compiled and there was a perfect output. When I tried to print the same without the while loop, it returned a garbage value. So, is not initializing fine?
#include <stdio.h>
int main() {
int j;
while(j<=10){
printf("\n %d",j);
j=j+1;
}
}
Try disassembling your code, for example "gcc -S test.c", so you can see that there is no instruction dedicated to initializing the integer "j" with or without loop. Greetings.
… it holds a garbage value.
When anybody says an object has a garbage value, they are being imprecise with language.
There is no value that is a garbage value. For 32-bit two’s complement integers, there are the values −2,147,483,648 to +2,147,483,647. Each of them is a valid value. None of them is a garbage value.
What it actually means to say something has a garbage value, if the speaker understands C semantics, is that the value of the object is uncontrolled. It has not been set to any specific value, and therefore whatever value you get from using it is a happenstance of circumstances. It may be some value that was in the memory of the object before it was reserved to be the memory for that object.
However, it might be other things. When an object is uninitialized, the C standard not only has no requirement that the memory of the object have any particular value, it has no requirement that the object behave as if it had any fixed value at all. This frees the compiler for purposes that are useful for optimization in other situations. But it means that, if you have int x; printf("%d\n", x); int y = x+3; printf("%d\n", y);, the compiler does not have to load x from memory each time it is used. Because x is uninitialized, the compiler is not required to do any work to load it from memory. For the printf("%d\n", x);, the compiler might let x be whatever value is in the register that would be used to pass the second argument to printf. For the int y = x+3;, the compiler might let x be whatever is in some other register that it would use to hold the value of x if x were defined. This could be a different register. So printf("%d\n", x); might print “47” while int y = x+3; printf("%d\n", y); prints “−372”, not “50”.
Sometimes an uninitialized object might behave as if it started with the value zero. This is not uncommon in short programs such as the one in the question, where nothing has used the stack much yet, so the part of it reserved for j is still in the initial state the program loader put it in, filled with zeros. But that is happenstance. When you change the program, the compiler might use a different part of the stack for j, and that part might not have zeros in it, because it was used by some of the initial program start-up code that runs before main starts.
Always avoid use a uninitialized variables, because it cause undefined behavior.
Uninitialized variables
Unlike some programming languages, C/C++ does not initialize most variables to a given value (such as zero) automatically. Thus when a variable is assigned a memory location by the compiler, the default value of that variable is whatever (garbage) value happens to already be in that memory location! A variable that has not been given a known value (usually through initialization or assignment) is called an uninitialized variable.
For your reference:
https://wiki.sei.cmu.edu/confluence/display/c/EXP33-C.+Do+not+read+uninitialized+memory
https://www.learncpp.com/cpp-tutorial/uninitialized-variables-and-undefined-behavior/
Well, I know how the structure works in C, but I don't know how it works internally, because I'm still learning assembly, I'm at the beginning, well, my question is, in the code below I have a structure called P and create two variables from From this structure called A and B, after assigning A to B, thus being B = A, I can get the data from A, even without using a pointer, how is this copy of the data from A to B made?
#include <stdio.h>
struct P{
int x;
int y;
}A, B;
int main(void) {
printf("%p\n%p\n\n", &A, &B);
printf("Member x of a: %p\nMember y of a: %p\n", &A.x, &A.y);
printf("Member x of b: %p\nMember y of b: %p\n", &B.x, &B.y);
A.x = 10;
A.y = 15;
B = A; // 10
printf("%d\n%d\n", B.x, B.y);
return 0;
}
The interesting thing in your sample code, I think, is the line
B = A;
Typically, the compiler implements this in one of two ways.
(1) It copies the members individually, giving more or less exactly the same effect as if you had said
B.x = A.x;
B.y = A.y;
(2) It emits a low-level byte-copy loop (or machine instruction), giving the effect of
memcpy(&B, &A, sizeof(struct P));
(except that typically this is done in-line, with no actual function call involved).
The compiler will choose one or the other of these based on which one is smaller (less emitted code), or which one is more efficient, or whatever the compiler is trying to optimize for.
Your example limits what the compiler can do, basically mandating that the struct exist in memory. First, you're instructing the compiler to create A & B as globals, and, second, you are taking the address of the struct (and its fields) for your printf statement. Due to either of these, the compiler will choose memory as the placement for these structs.
However, since they are each only two int's in size, copy between them would take only two mov instructions (some architectures) or two loads and two stores (other architectures).
Yet, if you were working with these structs as local variables and/or parameters as is commonly done with these kind of small structs — and provided you did not take their addresses — these would frequently be optimized by the compiler to place the entire struct into the cpu registers. For example, A.x might get a cpu register, and A.y also its own register. Now, a copy or pass as of A as parameter (which is like an assignment) is just a pair of register movs (if even that is required, as the compiler might choose the proper registers in the first place). In other words, unless the user program forces the struct to memory, the compiler has the freedom to treat the struct as a pair of rather separate int's. So, by contrast, potentially rather different and more efficient.
The compiler can also do other kinds of optimizations, one involving remembering the constant values that were assigned (as so do constant assigns again with B instead of copies from A's memory), and another involving eliminating A and the assignments to A and doing assignments directly to B, as A is merely copied into B and not used later. Among other things, to reiterate from above, having the structs be local variables helps some of these optimizations as does not taking their addresses.
this is the code :
#include <stdio.h>
#include <stdlib.h>
int main() {
int a = 10;
int b = 20;
//printf("\n&a value %p", &a);
int* x = &b;
x = x + 1;
*x = 5;
printf("\nb value %d", b);
printf("\na value %d", a);
}
I want override a with b adress for test the c overflow but when I comment the line 5(printf fuction) I can't write five in a. While if I print the a adress I can write five in a.
Why?
Sorry for my english and thank you.
The reason this occurred is that all normal compilers store objects with automatic storage duration (objects declared inside a block that are not static or extern) on a stack. Your compiler “pushed” a onto the stack, which means it wrote a to the memory location where the stack pointer was pointing and then decremented the pointer. (Decrementing the pointer adds to the stack, because the stack grows in the direction of decreasing memory addresses. Stacks can be oriented in the other direction, but the behavior you observed strongly suggests your system uses the common direction of growing downward.) Then your compiler pushed b onto the stack. So b ended up at a memory address just below a.
When you took the address of b and added one, that produced the memory address where a is. When you used that address to assign 5, that value was written to where a is.
None of this behavior is defined by the C standard. It is a consequence of the particular compiler you used and the switches you compiled with.
You probably compiled with little or no optimization. With optimization turned on, many compilers would simplify the code by removing unnecessary steps (essentially replacing them with shortcuts), so that 20 and 10 are not actually stored on the stack. A possible result with optimization is that “20” and “10” are printed, and your assignment to *x has no effect. However, the C standard does not say what the behavior must be when you use *x in this way, so the results are determined only by the particular compiler you are using, along with the input switches you give it.
After x = x + 1;, x contains an address that you do not own. And by doing *x = 5; you are trying to write to some location that might not be accessible to you. Thus causing UB. Nothing more can be reasoned about.
Depended of the version of the C compiler and compiler flags it is possible to initialize variables on any place in your functions (As far as I am aware).
I'm used to put it all the variables at the top of the function, but the discussion started about the memory use of the variables if defined in any other place in the function.
Below I have written 2 short examples, and I wondered if anyone could explain me (or verify) how the memory gets allocated.
Example 1: Variable y is defined after a possible return statement, there is a possibility this variable won't be used for that reason, as far as I'm aware this doesn't matter and the code (memory allocation) would be the same if the variable was placed at the top of the function. Is this correct?
Example 2: Variable x is initialized in a loop, meaning that the scope of this variable is only within this loop, but what about the memory use of this variable? Would it be any different if placed on the top of the functions? Or just initialized on the stack at the function call?
Edit: To conclude a main question:
Does reducing the scope of the variable or change the location of the first use (so anywhere else instead of top) have any effects on the memory use?
Code example 1
static void Function(void){
uint8_t x = 0;
//code changing x
if(x == 2)
{
return;
}
uint8_t y = 0;
//more code changing y
}
Code example 2
static void LoopFunction(void){
uint8_t i = 0;
for(i =0; i < 100; i ++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%d", x);
}
//more code
}
I'm used to put it all the variables at the top of the function
This used to be required in the older versions of C, but modern compilers dropped that requirement. As long as they know the type of the variable at the point of its first use, the compilers have all the information they need.
I wondered if anyone could explain me how the memory gets allocated.
The compiler decides how to allocate memory in the automatic storage area. Implementations are not limited to the approach that gives each variable you declare a separate location. They are allowed to reuse locations of variables that go out of scope, and also of variables no longer used after a certain point.
In your first example, variable y is allowed to use the space formerly occupied by variable x, because the first point of use of y is after the last point of use of x.
In your second example the space used for x inside the loop can be reused for other variables that you may declare in the // more code area.
Basically, the story goes like this. When calling a function in raw assembler, it is custom to store everything used by the function on the stack upon entering the function, and clean it up upon leaving. Certain CPUs and ABIs may have a calling convention which involves automatic stacking of parameters.
Likely because of this, C and many other old languages had the requirement that all variables must be declared at the top of the function (or on top of the scope), so that the { } reflect push/pop on the stack.
Somewhere around the 80s/90s, compilers started to optimize such code efficiently, as in they would only allocate room for a local variable at the point where it was first used, and de-allocate it when there was no further use for it. Regardless of where that variable was declared - it didn't matter for the optimizing compiler.
Around the same time, C++ lifted the variable declaration restrictions that C had, and allowed variables to be declared anywhere. However, C did not actually fix this before the year 1999 with the updated C99 standard. In modern C you can declare variables everywhere.
So there is absolutely no performance difference between your two examples, unless you are using an incredibly ancient compiler. It is however considered good programming practice to narrow the scope of a variable as much as possible - though it shouldn't be done at the expense of readability.
Although it is only a matter of style, I would personally prefer to write your function like this:
(note that you are using the wrong printf format specifier for uint8_t)
#include <inttypes.h>
static void LoopFunction (void)
{
for(uint8_t i=0; i < 100; i++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%" PRIu8, x);
}
//more code
}
Old C allowed only to declare (and initialize) variables at the top of a block. You where allowed to init a new block (a pair of { and } characters) anywhere inside a block, so you had then the possibility of declaring variables next to the code using them:
... /* inside a block */
{ int x = 3;
/* use x */
} /* x is not adressabel past this point */
And you where permitted to do this in switch statements, if statements and while and do statements (everywhere where you can init a new block)
Now, you are permitted to declare a variable anywhere where a statement is allowed, and the scope of that variable goes from the point of declaration to the end of the inner nested block you have declared it into.
Compilers decide when they allocate storage for local variables so, you can all of them allocated when you create a stack frame (this is the gcc way, as it allocates local variables only once) or when you enter in the block of definition (for example, Microsoft C does this way) Allocating space at runtime is something that requires advancing the stack pointer at runtime, so if you do this only once per stack frame you are saving cpu cycles (but wasting memory locations). The important thing here is that you are not allowed to refer to a variable location outside of its scoping definition, so if you try to do, you'll get undefined behaviour. I discovered an old bug for a long time running over internet, because nobody take the time to compile that program using Microsoft-C compiler (which failed in a core dump) instead of the commmon use of compiling it with GCC. The code was using a local variable defined in an inner scope (the then part of an if statement) by reference in some other part of the code (as everything was on main function, the stack frame was present all the time) Microsoft-C just reallocated the space on exiting the if statement, but GCC waited to do it until main finished. The bug solved by just adding a static modifier to the variable declaration (making it global) and no more refactoring was neccesary.
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
struct bla_bla x;
...
pointer_to_x = &x;
}
/* x does not exist (but it did in gcc) */
do_something_to_bla_bla(pointer_to_x); /* wrong, x doesn't exist */
} /* main */
when changed to:
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
static struct bla_bla x; /* now global ---even if scoped */
...
pointer_to_x = &x;
}
/* x is not visible, but exists, so pointer_to_x continues to be valid */
do_something_to_bla_bla(pointer_to_x); /* correct now */
} /* main */
can someone explain this to me
main()
{
int *x,y;
*x = 1;
y = *x;
printf("%d",y);
}
when I compile it in gcc how come running this in main function is ok, while running it in different function wont work like the function below?
test()
{
int *x,y;
*x = 1;
y = *x;
printf("%d",y);
}
int *x,y;
*x = 1;
Undefined Behavior. x doesn't point to anything meaningful.
This will be correct:
int *x, y, z;
x = &z;
*x = 1;
y = *x;
or
int *x, y;
x = malloc(sizeof(int));
*x = 1;
y = *x;
//print
free(x);
Undefined behavior is, well, undefined. You can't know how it will behave. It can seem to work, crash, print unpredictable results and anything else. Or it can behave differently on different runs. Don't rely on undefined behavior
Technically, in standardese, you invoke what is called undefined behavior due to using an uninitialized value (the value of the pointer x).
What's going on under the hood is very likely this: your compiler allocates local variables on the stack. Calling functions likely changes the stack pointer, so different function's local variables are at different places on the stack. This in turn makes the value of the uninitialized x be whatever happens to be at that place in the current stack frame. This value can be different, depending on the depth of the chain of functions you called. The actual value can depend on a lot of things, e.g. back to the whole history of processes called before your program started. There's no point in speculating what the actual value might be and what kind of erroneous behavior might possibly ensue. In the C community we refer to undefined behaviour as even having the possibility to make demons fly out of your nose. It might even start WW3 (assuming appropriate hardware is installed).
Seriously, a C programmer worth her money will take extreme care not to invoke undefined behavior.
since x is a pointer, its not containing the int itself, it points to another memory location which holds that value.
I think you assume that declaring a pointer to a value also reserves memory for it... not in C.
If you made the above error in your code, maybe it would be good if I gave you a little bit more graphic representation of what is actually going on in the code... this is a common novice error. The explanation below might seem a bit verbose and basic, but it might help your brain "see" what is actually going on.
Let's begin... if [xxxx] is a value being stored in a few bits in the RAM, and [????] is an unknown value (in physical ram) you can say that that for X to be properly used it should be:
x == [xxxx] -> [xxxx]
x == address of a value (int)
when you write: *x=1 above, you are changing the value of an unknown area of RAM, so you are in fact doing:
x == [????] -> [0001] // address [????] is completely undefined !
In fact, we don't even know IF address [????] is allocated or accessible by your application (this is the undefined part), its possible the address points to anything. Function code, dll address, file handle structure... it all depends on the compiler/OS/application state, and can never be relied on.
so to be able to use a pointer to an int, we must first allocate memory for it, and assign the address of that memory to x, ex:
int y; // allocate on the stack
x = &y; // & operator means, *address* of"
or
x = malloc(sizeof(int)); // in 'heap' memory (often called dynamic memory allocation)
// here malloc() returns the *address* of a memory space which is at least large enough
// to store an int, and is known to be reserved for your application.
at this point, we know that x holds a proper memory address so we'll just say it's currently set to [3948] (and contains an unknown value).
x == [3948] -> [????]
Using the * operator, you dereference the pointer address (i.e. look it up), to store a value AT that address.
*x = 1;
means:
x == [3948] -> [0001]
I hope this helps