I want to ask how C the variables are stored in C?
To be more clear consider the following code:
int main() {
int a = 1, b;
b = a + 2;
return 0;
}
For example here in what memory C stores the names of variable places.
eg if &a=0x12A7(suppose) &b=0x123B1, then how and where does c stores the variable names like in which memory name a is stored?
Variable names need not be stored at all! The compiler can get rid of them entirely. Imagine, if the compiler is quite clever, it can reduce your entire program to this:
int main(){
return 0;
}
Note that the effect of this program is exactly the same as your original, and now there are no variables at all! No need to name them now, is there?
Even if the variables in your code were actually used, their names are purely a convenient notation when you write the program, but aren't needed by the processor when it executes your code. As far as a microprocessor is concerned, a function like this:
int foo(int x, int y) {
int z = x + y;
return z * 2;
}
Might result in compiled code that does this, in some hypothetical simple instruction set architecture (ISA):
ADD # consumes top two values on stack (x and y), pushes result (z)
PUSH 2 # pushes 2 on stack
MULT # consumes top two values on stack (z and 2), pushes result
RET
The longer story is that variable names are sometimes stored for debugging purposes. For example if you're using GCC you can pass the -g option to emit a "symbol table" which contains things like variable names for debugging. But it isn't needed simply to run a program, and it isn't covered by the language standard--it's an implementation feature which differs by platform.
C doesn't store name of the variables. Its the compiler that stores the names of variables in compiler's symbol table.
This data structure is created and maintained by compiler.
An example of a symbol table for the snippet
// Declare an external function
extern double bar(double x);
// Define a public function
double foo(int count)
{
double sum = 0.0;
// Sum all the values bar(1) to bar(count)
for (int i = 1; i <= count; i++)
sum += bar((double) i);
return sum;
}
may contain at least the following symbol:
Ok first off if you are just getting your head on strait with C this is where to start:
http://condor.cc.ku.edu/~grobe/intro-to-C.shtml
But that is more practical than your question. To answer that we first ask why variables have addresses. The why here is the stack. For a program to operate return calls must be directed to the appropriate buffer so the the pieces all fit together as designed. Now to what I believe was the original question, that is how the actual address is decided, for the answer to that you would have to understand how the processor is implementing the heap.
https://en.wikipedia.org/wiki/Memory_management
"Since the precise location of the allocation is not known in advance, the memory is accessed indirectly, usually through a pointer reference. The specific algorithm used to organize the memory area and allocate and deallocate chunks is interlinked with the kernel..."
Which brings us back to the practical side of things with the abstraction to pointers:
https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
Hope tis gives you a little clearer picture of what's under the hood : )
Happy codding.
Related
#include <stdio.h>
int main()
{
for(int i=0;i<100;i++)
{
int count=0;
printf("%d ",++count);
}
return 0;
}
output of the above program is: 1 1 1 1 1 1..........1
Please take a look at the code above. I declared variable "int count=0" inside the for loop.
With my knowledge, the scope of the variable is within the block, so count variable will be alive up to for loop execution.
"int count=0" is executing 100 times, then it has to create the variable 100 times else it has to give the error (re-declaration of the count variable), but it's not happening like that — what may be the reason?
According to output the variable is initializing with zero every time.
Please help me to find the reason.
Such simple code can be visualised on http://www.pythontutor.com/c.html for easy understanding.
To answer your question, count gets destroyed when it goes outside its scope, that is the closing } of the loop. On next iteration, a variable of the same name is created and initialised to 0, which is used by the printf.
And if counting is your goal, print i instead of count.
The C standard describes the C language using an abstract model of a computer. In this model, count is created each time the body of the loop is executed, and it is destroyed when execution of the body ends. By “created” and “destroyed,” we mean that memory is reserved for it and is released, and that the initialization is performed with the reservation.
The C standard does not require compilers to implement this model slavishly. Most compilers will allocate a fixed amount of stack space when the routine starts, with space for count included in this fixed amount, and then count will use that same space in each iteration. Then, if we look at the assembly code generated, we will not see any reservation or release of memory; the stack will be grown and shrunk only once for the whole routine, not grown and shrunk in each loop iteration.
Thus, the answer is twofold:
In C’s abstract model of computing, a new lifetime of count begins and ends in each loop iteration.
In most actual implementations, memory is reserved just once for count, although implementations may also allocate and release memory in each iteration.
However, even if you know your C implementation allocates stack space just once per routine when it can, you should generally think about programs in the C model in this regard. Consider this:
for (int i = 0; i < 100; ++i)
{
int count = 0;
// Do some things with count.
float x = 0;
// Do some things with x.
}
In this code, the compiler might allocate four bytes of stack space to use for both count and x, to be used for one of them at a time. The routine would grow the stack once, when it starts, including four bytes to use for count and x. In each iteration of the loop, it would use the memory first for count and then for x. This lets us see that the memory is first reserved for count, then released, then reserved for x, then released, and then that repeats in each iteration. The reservations and releases occur conceptually even though there are no instructions to grow and shrink the stack.
Another illuminating example is:
for (int i = 0; i < 100; ++i)
{
extern int baz(void);
int a[baz()], b[baz()];
extern void bar(void *, void *);
bar(a, b);
}
In this case, the compiler cannot reserve memory for a and b when the routine starts because it does not know how much memory it will need. In each iteration, it must call baz to find how much memory is needed for a and how much for b, and then it must allocate stack space (or other memory) for them. Further, since the sizes may vary from iteration to iteration, it is not possible for both a and b to start in the same place in each iteration—one of them must move to make way for the other. So this code lets us see that a new a and a new b must be created in each iteration.
int count=0 is executing 100 times, then it has to create the variable 100 times
No, it defines the variable count once, then assigns it the value 0 100 times.
Defining a variable in C does not involve any particular step or code to "create" it (unlike for example in C++, where simply defining a variable may default-construct it). Variable definitions just associate the name with an "entity" that represents the variable internally, and definitions are tied to the scope where they appear.
Assigning a variable is a statement which gets executed during the normal program flow. It usually has "observable effects", otherwise the compiler is allowed to optimize it out entirely.
OP's example can be rewritten in a completely equivalent form as follows.
for(int i=0;i<100;i++)
{
int count; // definition of variable count - defined once in this {} scope
count=0; // assignment of value 0 to count - executed once per iteration, 100 times total
printf("%d ",++count);
}
Eric has it correct. In much shorter form:
Typically compilers determine at compile time how much memory is needed by a function and the offsets in the stack to those variables. The actual memory allocations occur on each function call and memory release on the function return.
Further, when you have variables nested within {curly braces} once execution leaves that brace set the compiler is free to reuse that memory for other variables in the function. There are two reasons I intentionally do this:
The variables are large but only needed for a short time so why make stacks larger than needed? Especially if you need several large temporary structures or arrays at different times. The smaller the scope the less chance of bugs.
If a variable only has a sane value for a limited amount of time, and would be dangerous or buggy to use out of that scope, add extra curly braces to limit the scope of access so improper use generates immediate compiler errors. Using unique names for each variable, even if the compiler doesn't insist on it, can help the debugger, and your mind, less confused.
Example:
your_function(int a)
{
{ // limit scope of stack_1
int stack_1 = 0;
for ( int ii = 0; ii < a; ++ii ) { // really limit scope of ii
stack_1 += some_calculation(i, a);
}
printf("ii=%d\n", ii); // scope error
printf("stack_1=%d\n", stack_1); // good
} // done with stack_1
{
int limited_scope_1[10000];
do_something(a,limited_scope_1);
}
{
float limited_scope_2[10000];
do_something_else(a,limited_scope_2);
}
}
A compiler given code like:
void do_something(int, int*);
...
for (int i=0; i<100; i++)
{
int const j=(i & 1);
doSomething(i, &j);
}
could legitimately replace it with:
void do_something(int, int*);
...
int const __compiler_generated_0 = 0;
int const __compiler_generated_1 = 1;
for (int i=0; i<100; i+=2)
{
doSomething(i, &compiler_generated_0);
doSomething(i+1, &compiler_generated_1);
}
Although a compiler would typically allocate space on the stack once for j, when the function was entered, and then not reuse the storage during the loop (or even the function), meaning that j would have the same address on every iteration of the loop, there is no requirement that the address remain constant. While there typically wouldn't be an advantage to having the address vary on different iterations, compilers are be allowed to exploit such situations should they arise.
Well, I know how the structure works in C, but I don't know how it works internally, because I'm still learning assembly, I'm at the beginning, well, my question is, in the code below I have a structure called P and create two variables from From this structure called A and B, after assigning A to B, thus being B = A, I can get the data from A, even without using a pointer, how is this copy of the data from A to B made?
#include <stdio.h>
struct P{
int x;
int y;
}A, B;
int main(void) {
printf("%p\n%p\n\n", &A, &B);
printf("Member x of a: %p\nMember y of a: %p\n", &A.x, &A.y);
printf("Member x of b: %p\nMember y of b: %p\n", &B.x, &B.y);
A.x = 10;
A.y = 15;
B = A; // 10
printf("%d\n%d\n", B.x, B.y);
return 0;
}
The interesting thing in your sample code, I think, is the line
B = A;
Typically, the compiler implements this in one of two ways.
(1) It copies the members individually, giving more or less exactly the same effect as if you had said
B.x = A.x;
B.y = A.y;
(2) It emits a low-level byte-copy loop (or machine instruction), giving the effect of
memcpy(&B, &A, sizeof(struct P));
(except that typically this is done in-line, with no actual function call involved).
The compiler will choose one or the other of these based on which one is smaller (less emitted code), or which one is more efficient, or whatever the compiler is trying to optimize for.
Your example limits what the compiler can do, basically mandating that the struct exist in memory. First, you're instructing the compiler to create A & B as globals, and, second, you are taking the address of the struct (and its fields) for your printf statement. Due to either of these, the compiler will choose memory as the placement for these structs.
However, since they are each only two int's in size, copy between them would take only two mov instructions (some architectures) or two loads and two stores (other architectures).
Yet, if you were working with these structs as local variables and/or parameters as is commonly done with these kind of small structs — and provided you did not take their addresses — these would frequently be optimized by the compiler to place the entire struct into the cpu registers. For example, A.x might get a cpu register, and A.y also its own register. Now, a copy or pass as of A as parameter (which is like an assignment) is just a pair of register movs (if even that is required, as the compiler might choose the proper registers in the first place). In other words, unless the user program forces the struct to memory, the compiler has the freedom to treat the struct as a pair of rather separate int's. So, by contrast, potentially rather different and more efficient.
The compiler can also do other kinds of optimizations, one involving remembering the constant values that were assigned (as so do constant assigns again with B instead of copies from A's memory), and another involving eliminating A and the assignments to A and doing assignments directly to B, as A is merely copied into B and not used later. Among other things, to reiterate from above, having the structs be local variables helps some of these optimizations as does not taking their addresses.
I know that the global variable is stored in memory, and the local variable is saved in the stack, and the stack will be destroyed after each use.
But I do not know how an array is saved in memory.
I tried to declare a global array:
int tab[5]={10,9,12,34,30};
and at the end I read the contents of the memory, I mean, that after the execution of the code, I read the contents of the memory, (e.g I'm working on a microcontroller, and I know where the data is saving) when I declare a global variable for example a = 10; and when I read the contents of the memory I find the value 10 in the memory, but I do not find the content of the table that is 10,9,12,34,30
I want to understand where the array content is it save in memory?
I work on Aurix Infineon, and I use Hightec as a compiler, I execute my code directly on the aurix, I read the memorition like this:
const volatile unsigned char * mem_start = 0xd0000000;
#define size ((ptrdiff_t) 0xfff)
unsigned char bufferf [size];
code ();
main(){
...
for (int e = 0; e < sizeof (bufferf); e ++)
bufferf [e] = * (mem_start + e); // read memory
}
The answer to the question of where arrays, or indeed any variables, are stored depends on whether you are considering the abstract machine or the real hardware. The C standard specifies how things work on the abstract machine, but a conforming compiler is free to do something else on the real hardware (e.g., due to optimization) if the observable effects are the same.
(On the abstract machine, arrays are generally stored in the same place as other variables declared in the same scope would be.)
For example, a variable might be placed in a register instead of the stack, or it might be optimized away completely. As a programmer you generally shouldn't care about this, since you can also just consider the abstract machine. (Admittedly there may be some cases where you do care about it, and on microcontrollers with very limited RAM one reason might be that you have to be very frugal about using the stack, etc.)
As for your code for reading the memory: it cannot possibly work. If size is the size of the memory available to variables, you cannot fit the array bufferf[size] in that memory together with everything else.
Fortunately, copying the contents of the memory to a separate buffer is not needed. Consider your line bufferf[e] = *(mem_start + e) – since you can already read an arbitrary index e from memory at mem_start, you can use *(mem_start + e) (or, better, mem_start[e], which is exactly equivalent) directly everywhere you would use bufferf[e]! Just treat mem_start as pointing to the first element of an array of size bytes.
Also note that if you are programmatically searching for the contents of the array tab, its elements are ints, so they are more than one byte each – you won't simply find five adjacent bytes with those values.
(Then again, you can also just take the address of tab[0] and find out where it is stored, i.e., ((unsigned char *) tab) - mem_start is the index of tab in mem_start. However, here observing it may change things due to the aforementioned optimization.)
global variable is stored in memory, and the local variable is saved in the stack
This is false.
Global variables are sometimes kept in registers, not only in static memory.
If the function is recursive the compiler may choose to use the stack or not. If the function is tail recursive it is no need to use the stack as there is no continuation after the function returns and one can use the current frame for the next call.
There are mechanical methods to convert the code in continuation passing style and in this equivalent form it is evaluated stackless.
There are lots of mathematical models of computation and not all use the stack. The code can be converted from one model to other keeping the same result after evaluation.
You got these informations from books written in the 70s, 80s and in the meantime the process of evaluation of code was much improved (using methods that were theoretical in the 30s but nowadays are implemented in systems).
This program may get you closer to what you want:
int tab[] = {10, 9, 12, 34, 30};
// Used to get the number of elements in an array
#define ARRAY_SIZE(array) \
(sizeof(array) / sizeof(array[0]))
// Used to suppress compiler warnings about unused variables
#define UNUSED(x) \
(void)(x)
int main(void) {
int *tab_ptr;
int tab_copy[ARRAY_SIZE(tab)];
tab_ptr = tab;
UNUSED(tab_ptr);
for (int index = 0; index < ARRAY_SIZE(tab_copy); index++) {
tab_copy[index] = tab[index];
}
return 0;
}
I don't have the Hightec environment to test this, nor do I have an Aurix to run it on. However, this may give you a starting point that helps you obtain the information you desire.
It looks like the Infineon Aurix is a 32-bit little-endian machine. This means that an int will be 32-bits wide and the first byte will store the first 8 bits of the value. For example, the value 10 would be stored like this:
0a 00 00 00
If you try to read this memory as char, you will get 0x0a, 0x00, 0x00, and 0x00.
Depended of the version of the C compiler and compiler flags it is possible to initialize variables on any place in your functions (As far as I am aware).
I'm used to put it all the variables at the top of the function, but the discussion started about the memory use of the variables if defined in any other place in the function.
Below I have written 2 short examples, and I wondered if anyone could explain me (or verify) how the memory gets allocated.
Example 1: Variable y is defined after a possible return statement, there is a possibility this variable won't be used for that reason, as far as I'm aware this doesn't matter and the code (memory allocation) would be the same if the variable was placed at the top of the function. Is this correct?
Example 2: Variable x is initialized in a loop, meaning that the scope of this variable is only within this loop, but what about the memory use of this variable? Would it be any different if placed on the top of the functions? Or just initialized on the stack at the function call?
Edit: To conclude a main question:
Does reducing the scope of the variable or change the location of the first use (so anywhere else instead of top) have any effects on the memory use?
Code example 1
static void Function(void){
uint8_t x = 0;
//code changing x
if(x == 2)
{
return;
}
uint8_t y = 0;
//more code changing y
}
Code example 2
static void LoopFunction(void){
uint8_t i = 0;
for(i =0; i < 100; i ++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%d", x);
}
//more code
}
I'm used to put it all the variables at the top of the function
This used to be required in the older versions of C, but modern compilers dropped that requirement. As long as they know the type of the variable at the point of its first use, the compilers have all the information they need.
I wondered if anyone could explain me how the memory gets allocated.
The compiler decides how to allocate memory in the automatic storage area. Implementations are not limited to the approach that gives each variable you declare a separate location. They are allowed to reuse locations of variables that go out of scope, and also of variables no longer used after a certain point.
In your first example, variable y is allowed to use the space formerly occupied by variable x, because the first point of use of y is after the last point of use of x.
In your second example the space used for x inside the loop can be reused for other variables that you may declare in the // more code area.
Basically, the story goes like this. When calling a function in raw assembler, it is custom to store everything used by the function on the stack upon entering the function, and clean it up upon leaving. Certain CPUs and ABIs may have a calling convention which involves automatic stacking of parameters.
Likely because of this, C and many other old languages had the requirement that all variables must be declared at the top of the function (or on top of the scope), so that the { } reflect push/pop on the stack.
Somewhere around the 80s/90s, compilers started to optimize such code efficiently, as in they would only allocate room for a local variable at the point where it was first used, and de-allocate it when there was no further use for it. Regardless of where that variable was declared - it didn't matter for the optimizing compiler.
Around the same time, C++ lifted the variable declaration restrictions that C had, and allowed variables to be declared anywhere. However, C did not actually fix this before the year 1999 with the updated C99 standard. In modern C you can declare variables everywhere.
So there is absolutely no performance difference between your two examples, unless you are using an incredibly ancient compiler. It is however considered good programming practice to narrow the scope of a variable as much as possible - though it shouldn't be done at the expense of readability.
Although it is only a matter of style, I would personally prefer to write your function like this:
(note that you are using the wrong printf format specifier for uint8_t)
#include <inttypes.h>
static void LoopFunction (void)
{
for(uint8_t i=0; i < 100; i++)
{
uint8_t x = i;
// do some calculations
uartTxLine("%" PRIu8, x);
}
//more code
}
Old C allowed only to declare (and initialize) variables at the top of a block. You where allowed to init a new block (a pair of { and } characters) anywhere inside a block, so you had then the possibility of declaring variables next to the code using them:
... /* inside a block */
{ int x = 3;
/* use x */
} /* x is not adressabel past this point */
And you where permitted to do this in switch statements, if statements and while and do statements (everywhere where you can init a new block)
Now, you are permitted to declare a variable anywhere where a statement is allowed, and the scope of that variable goes from the point of declaration to the end of the inner nested block you have declared it into.
Compilers decide when they allocate storage for local variables so, you can all of them allocated when you create a stack frame (this is the gcc way, as it allocates local variables only once) or when you enter in the block of definition (for example, Microsoft C does this way) Allocating space at runtime is something that requires advancing the stack pointer at runtime, so if you do this only once per stack frame you are saving cpu cycles (but wasting memory locations). The important thing here is that you are not allowed to refer to a variable location outside of its scoping definition, so if you try to do, you'll get undefined behaviour. I discovered an old bug for a long time running over internet, because nobody take the time to compile that program using Microsoft-C compiler (which failed in a core dump) instead of the commmon use of compiling it with GCC. The code was using a local variable defined in an inner scope (the then part of an if statement) by reference in some other part of the code (as everything was on main function, the stack frame was present all the time) Microsoft-C just reallocated the space on exiting the if statement, but GCC waited to do it until main finished. The bug solved by just adding a static modifier to the variable declaration (making it global) and no more refactoring was neccesary.
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
struct bla_bla x;
...
pointer_to_x = &x;
}
/* x does not exist (but it did in gcc) */
do_something_to_bla_bla(pointer_to_x); /* wrong, x doesn't exist */
} /* main */
when changed to:
int main()
{
struct bla_bla *pointer_to_x;
...
if (something) {
static struct bla_bla x; /* now global ---even if scoped */
...
pointer_to_x = &x;
}
/* x is not visible, but exists, so pointer_to_x continues to be valid */
do_something_to_bla_bla(pointer_to_x); /* correct now */
} /* main */
I've been trying to get into the habit of defining trivial variables at the point they're needed. I've been cautious about writing code like this:
while (n < 10000) {
int x = foo();
[...]
}
I know that the standard is absolutely clear that x exists only inside the loop, but does this technically mean that the integer will be allocated and deallocated on the stack with every iteration? I realise that an optimising compiler isn't likely to do this, but it that guaranteed?
For example, is it ever better to write:
int x;
while (n < 10000) {
x = foo();
[...]
}
I don't mean with this code specifically, but in any kind of loop like this.
I did a quick test with gcc 4.7.2 for a simple loop differing in this way and the same assembly was produced, but my question is really are these two, according to the standard, identical?
Note that "allocating" automatic variables like this is pretty much free; on most machines it's either a single-instruction stack pointer adjustment, or the compiler uses registers in which case nothing needs to be done.
Also, since the variable remains in scope until the loop exits, there's absolutely no reason to "delete" (=readjust the stack pointer) it until the loop exits, I certainly wouldn't expect there to be any overhead per-iteration for code like this.
Also, of course the compiler is free to "move" the allocation out of the loop altogether if it feels like it, making the code equivalent to your second example with the int x; before the while. The important thing is that the first version is easier to read and more tighly localized, i.e. better for humans.
Yes, the variable x inside the loop is technically defined on each iteration, and initialized via the call to foo() on each iteration. If foo() produces a different answer each time, this is fine; if it produces the same answer each time, it is an optimization opportunity – move the initialization out of the loop. For a simple variable like this, the compiler typically just reserves sizeof(int) bytes on the stack — if it can't keep x in a register — that it uses for x when x is in scope, and may reuse that space for other variables elsewhere in the same function. If the variable was a VLA — variable length array — then the allocation is more complex.
The two fragments in isolation are equivalent, but the difference is the scope of x. In the example with x declared outside the loop, the value persists after the loop exits. With x declared inside the loop, it is inaccessible once the loop exits. If you wrote:
{
int x;
while (n < 10000)
{
x = foo();
...other stuff...
}
}
then the two fragments are near enough equivalent. At the assembler level, you'll be hard pressed to spot the difference in either case.
My personal point of view is that once you start worrying about such micro-optimisations, you're doomed to failure. The gain is:
a) Likely to be very small
b) Non-portable
I'd stick with code that makes your intention clear (i.e. declare x inside the loop) and let the compiler care about efficiency.
There is nothing in the C standard that says how the compiler should generate code in either case. It could adjust the stack pointer on every iteration of the loop if it fancies.
That being said, unless you start doing something crazy with VLAs like this:
void bar(char *, char *);
void
foo(int x)
{
int i;
for (i = 0; i < x; i++) {
char a[i], b[x - i];
bar(a, b);
}
}
the compiler will most likely just allocate one big stack frame at the beginning of the function. It's harder to generate code for creating and destroying variables in blocks instead of just allocating all you need at the beginning of the function.