When does load, store, and alloca get used in LLVM - c

I am looking at LLVM to see how they use load, store, and alloca. In the first slide below, there is no use of them. In the second, there is use of alloca.
I am not familiar with C so going to have to bring myself up to speed in order to run an example and figure this out myself, but wanted to ask if anyone knew already. Not sure the kind of example C code to write in order to determine the output that uses load, store, and alloca in LLVM.
The question is, when LLVM uses load, store, and alloca.
Wondering if load/store are necessary as well, or LLVM can do without it.
Figure 1 ↓
Figure 2 ↓

Without optimizations, clang will produce LLVM code where there's one alloca for each local variable, one read for each use of that variable as an r-value and one store for each assignment to that variable (including its initialization).
With optimizations, clang will try to minimize the number of reads and store and will often eliminate the alloca completely if possible (using only registers).
One way to ensure that the variable is stored in memory, even with optimizations, would be to take its address (since registers don't have an address).
Wondering if load/store are necessary as well, or LLVM can do without it.
You need store / load whenever you write to memory locations. So the question becomes whether you can do without memory, storing everything in registers. Since LLVM (unlike real machines) supports an infinite amount of registers, that's a valid question.
However, as I mentioned, registers don't have addresses. So any code that takes the address of a variable, needs to use memory. So does any code that performs arithmetic on addresses, such as code that indexes arrays.

alloca allocates memory in the function's local frame. It is necessary to create a variable whose address is taken, like in this example:
void foo(int* ptr) {
*ptr = 4;
}
int main() {
int value = 0;
foo(&value);
printf("%i\n", value); // 4
}
If it doesn't inline foo, then LLVM will need an alloca instruction in main to create the memory that backs the value variable. foo needs to use store to put 4 at the address that ptr points to, and main then needs to use load to load the contents of value after it's been modified by foo.
Compilers for C-family languages typically prefer to start off using alloca for every variable in the function's frame, and then let LLVM optimize the allocas into SSA values. In many cases, the compiler is able to promote allocated variables to SSA values, as the ssa2 function shows. The SSA form is capable of representing variables that meet the following two conditions:
their addresses aren't taken
their size is fixed
"Taking the address" of a variable is an operation that doesn't exist in Javascript/Ruby, so you may need to get up to speed on C to understand what it means. It is extremely common in C and C++.
"Fixed size" means that the compiler knows ahead of time how much memory it needs for a specific data structure. It always knows for simple integers, for instance, but arrays often have a variable size. Arrays of a size that isn't known at runtime can be allocated with alloca or malloc, and then you need to access their contents with load and store.
Finally, note that your second example is broken: it reads from an uninitialized value, and if you compile it at higher optimization levels, you'll just get ret i32 undef.

Related

Where is the address to the first element of an array stored?

I was playing with C, and I just discovered that a and &a yield to the same result that is the address to the first element of the array. By surfing here over the topics, I discovered they are only formatted in a different way. So my question is: where is this address stored?
This is an interesting question! The answer will depend on the specifics of the hardware you're working with and what C compiler you have.
From the perspective of the C language, each object has an address, but there's no specific prescribed mechanism that accounts for how that address would actually be stored or accessed. That's left up to the compiler to decide.
Let's imagine that you've declared your array as a local variable, and then write something like array[137], which accesses the 137th element of the array. How does the generated program know how to find your array? On most systems, the CPU has a dedicated register called the stack pointer that keeps track of the position of the memory used for all the local variables of the current function. As the compiler translates your C code into an actual executable file, it maintains an internal table mapping each local variable to some offset away from where the stack pointer points. For example, it might say something like "because 64 bytes are already used up for other local variables in this function, I'm going to place array 64 bytes past where the stack pointer points." Then, whenever you reference array, the compiler generates machine instructions of the form "look 64 bytes past the stack pointer to find the array."
Now, imagine you write code like this:
printf("%p\n", array); // Print address of array
How does the compiler generate code for this? Well, internally, it knows that array is 64 bytes past the stack pointer, so it might generate code of the form "add 64 to the stack pointer, then pass that as an argument to printf."
So in that sense, the answer to your question could be something like "the hardware stores a single pointer called the stack pointer, and the generated code is written in a way that takes that stack pointer and then adds some value to it to get to the point in memory where the array lives."
Of course, there are a bunch of caveats here. For example, some systems have both a stack pointer and a frame pointer. Interpreters use a totally different strategy and maintain internal data structures tracking where everything is. And if the array is stored at global scope, there's a different mechanism used altogether.
Hope thi shelps!
It isn't stored anywhere - it's computed as necessary.
Unless it is the operand of the sizeof, _Alignof, or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" is converted ("decays") to an expression of type "pointer to T", and the value of the expression is the address of the first element of the array.
When you declare an array like
T a[N]; // for any non-function type T
what you get in memory is
+---+
| | a[0]
+---+
| | a[1]
+---+
...
+---+
| | a[N-1]
+---+
That's it. No storage is materialized for any pointer. Instead, whenever you use a in any expression, the compiler will compute the address of a[0] and use that instead.
Consider this C code:
int x;
void foo(void)
{
int y;
...
}
When implementing this program, a C compiler will need to generate instructions that access the int objects named x and y and the int object allocated by the malloc. How does it tell those instructions where the objects are?
Each processor architecture has some way of referring to data in memory. This includes:
The machine instruction includes some bits that identify a processor register. The address in memory is in that processor register.
The machine instruction includes some bits that specify an address.
The machine instruction includes some bits that specify a processor register and some bits that specify an offset or displacement.
So, the compiler has a way of giving an address to the processor. It still needs to know that address. How does it do that?
One way is the compiler could decide exactly where everything in memory is going to go. It could decide it is going to put all the program’s instructions at addresses 0 to 10,000, and it is going to put data at 10,000 and on, and that x will go at address 12300. Then it could write an instruction to fetch x from address 12300. This is called absolute addressing, and it is rarely used anymore because it is inflexible.
Another option is that the compiler can let the program loader decide where to put the data. When the software that loads the program into memory is running, it will read the executable, see how much space is needed for instructions, how much is needed for data that is initialized to zero, how much space is needed for data with initial values listed in the executable file, how much space is needed for data that does not need to be initialized, how much space is requested for the stack, and so on. Then the loader will decide where to put all of these things. As it does so, it will set some processor registers, or some tables in memory, to contain the addresses where things go.
In this case, the compiler may know that x goes at displacement 2300 from the start of the “zero-initialized data” section, and that the loader sets register r12 to contain the base address of that section. Then, when the compiler wants to access x, it will generate an instruction that says “Use register r12 plus the displacement 2300.” This is largely the method used today, although there are many embellishments involving linking multiple object modules together, leaving a placeholder in the object module for the name x that the linker or loader fills in with the actual displacement as they do their work, and other features.
In the case of y, we have another problem. There can be two or more instances of y existing at once. The function foo might call itself, which causes there to be a y for the first call and a different y for the second call. Or foo might call another function that calls foo. To deal with this, most C implementations use a stack. One register in the processor is chosen to be a stack pointer. The loader allocates a large amount of space and sets the stack pointer register to point to the “top” of the space (usually the high-address end, but this is arbitrary). When a function is called, the stack pointer is adjusted according to how much space the new function needs for its local data. When the function executes, it puts all of its local data in memory locations determined by the value of the stack pointer when the function started executing.
In this model, the compiler knows that the y for the current function call is at a particular offset relative to the current stack pointer, so it can access y using instructions with addresses such as “the contents of the stack pointer plus 84 bytes.” (This can be done with a stack pointer alone, but often we also have a frame pointer, which is a copy of the stack pointer at the moment the function was called. This provides a firmer base address for working with local data, one that might not change as much as the stack pointer does.)
In either of these models, the compiler deals with the address of an array the same way it deals with the address of a single int: It knows where the object is stored, relative to some base address for its data segment or stack frame, and it generates the same sorts of instruction addressing forms.
Beyond that, when you access an array, such as a[i], or possibly a multidimensional array, a[i][j][k], the compiler has to do more calculations. To do this, compiler takes the starting address of the array and does the arithmetic necessary to add the offsets for each of the subscripts. Many processors have instructions that help with these calculations—a processor may have an addressing form that says “Take a base address from one register, add a fixed offset, and add the contents of another register multiplied by a fixed size.” This will help access arrays of one dimension. For multiple dimensions, the compiler has to write extra instructions to do some of the calculations.
If, instead of using an array element, like a[i], you take its address, as with &a[i], the compiler handles it similarly. It will get a base address from some register (the base address for the data segment or the current stack pointer or frame pointer), add the offset to where a is in that segment, and then add the offset required for i elements. All of the knowledge of where a[i] is is built into the instructions the compiler writes, plus the registers that help manage the program’s memory layout.
Yet one more point of view, a TL;DR answer if you will: When the compiler produces the binary, it stores the address everywhere where it is needed in the generated machine code.
The address may be just plain number in the machine code, or it may be a calculation of some sort, such as "stack frame base address register + a fixed offset number", but in either case it is duplicated everywhere in the machine code where it is needed.
In other words, it is not stored in any one location. Talking more technically, &some_array is not an lvalue, and trying to take the address of it, &(&some_array), will produce compiler error.
This actually applies to all variables, array is not special in any way here. The address of a variable can be used in the machine code directly (and if compiler actually generates code which does store the address somewhere, you have no way to know that from C code, you have to look at the assembly code).
The one thing special about arrays, which seems to be the source of your confusion is, that some_array is bascially a more convenient syntax for &(some_array[0]), while &some_array means something else entirely.
Another way to look at it:
The address of the first element doesn't have to be stored anywhere.
An array is a chunk of memory. It has an address simply because it exists somewhere in memory. That address may or may not have to be stored somewhere depending on a lot of things that others have already mentioned.
Asking where the address of the array has to be stored is like asking where reality stores the location of your car. The location doesn't have to be stored - your car is located where your car happens to be - it's a property of existing. Sure, you can make a note that you parked your car in row 97, spot 114 of some huge lot, but you don't have to. And your car will be wherever it is regardless of your note-taking.

Why does allowing for recursion make C slower/inefficient on 8-bit CPUs

Answers to this question about compiler efficiency for 8-bit CPUs seem to imply that allowing for recursion makes the C language inefficient on these architectures. I don't understand how recursive function calling (of the same function) is different from just repeated function calling of various functions.
I would like to understand why this is so (or why seemingly learned people think it is). I could guess that maybe these architectures just don't have the stack-space, or perhaps the push/pop is inefficient - but these are just guesses.
Because to efficiently implement the C stack, you need the ability to efficiently load and store to arbitrary offsets within the current frame. For example, the 8086 processor provided the indexed and based address modes, that allowed loading a stack variable within a single instruction. With the 6502, you can only do this with the X or Y register, and since those are the only general purpose registers, reserving one for a data stack pointer is extremely costly. The Z80 can do this with its IX or IY registers, but not the stack pointer register. However, indexed load instructions on the Z80 take a long time to execute, so it is still costly, along with the fact you either reserve a second register for the stack pointer, or have to load the stack pointer from the SP register any time you want to access variables.
By comparison, if recursive calls are not supported, then a second instance of the function can not start inside a call whilst an existing is still in progress. This means only a single set of variables is needed at a time and you can just allocate each function its own static piece of memory to use for variables. Since the memory has a fixed location, you can then use fixed address loads. Some implementations of fortran used this approach.

Can I modify the return data type from int to int8_t or int16_t?

I am using MIPS32 and coding in C.
currently many functions in my code return 'int' data type.
Since my development is on resource constrained hardware (even bytes matter) and the return values are just error codes (don't exceed 255), I am planning to shrink the return type either as int8_t or as int16_t.
What I am trying to achieve is to reduce the stack/memory usage of caller.
Before I attempt,
Will this result in stack/memory usage reduction in the caller? or
Since I have heard of memory alignment (mostly as 4 bytes) & don't know much, will that play a spoil sport here?
Example
int caller(){
int8_t status;
status = callee();
}
int8_t callee() {
...
return -1;
}
In the example above, does the status identifier declaration as int8_t or int16_t or int matters in mips32?
This will create absolutely no change when it comes to the call stack, an example of the MIPS call stack can be found here. https://courses.cs.washington.edu/courses/cse410/09sp/examples/MIPSCallingConventionsSummary.pdf
$31
$ra
The
Return Address
in a subroutine call.
Below that is an image and you will see the return address which is a full register, in your case using a 32bit machine your register will be size of 32bits and there is no changing that.
I do have to ask though, what are you doing that requires MIPS? Generally speaking that is a language used for teaching purposes and doesn't have much in the way of real world practical uses since it has many many flaws. As an example this concept of a return address does not exist with modern assemblies like X86 where the stack pointer will contain all that information.
EDIT:
As pointed out by people below I have been a bit unfair. Technically these address also exist.
$2-$3 $v0-$v1 These registers contain the
Returned Value
of a subroutine; if
the value is 1 word only $v0 is significant.
Again though they have a set size and from the perspective of the call stack they are using one full register. Theoretically I believe MIPS has ways to store 4 bytes inside of one register but I am unsure on this. More importantly though with the way MIPS works these return registers can ONLY be used if the call is one function deep. If you call a function within a function this concept falls apart and the return address becomes required hence why I just showed that one origonally.
First of all, "don't exceed 255" means you should be using uint not int.
When manually optimizing code for size, you should be using the uint_leastn_t types. These types allow the compiler to pick the smallest possible type necessary for the code to work, which is at least n bytes wide.
In your case this would be uint_least8_t. Though of course if the compiler always picks a 32 bit type, because that is what is required for aligned access, then the only thing you have gained by replacing int is better portability.
On MIPS32 the first four function parameters (integers or pointers; for simplicity I'm not considering 64-bit ints, floats or structs) arrive in registers a0 through a3. The rest goes on the stack, with each machine word of stack memory holding just one parameter. So, in terms of passing the error codes there will be no difference.
If you have to store error codes in local (automatic) variables, a lot will depend on the code. However, MIPS has plenty of registers and chances are there will be a register available for an error code and hence no stack space for it will be needed.
If you have global variables holding error codes, then definitely there will be a difference between using differently sized types.
Going back to the stack, you should note that there are several other things at play...
First, the stack must be aligned. This is worsened by the fact that modern compilers tend to align the stack pointer not on a multiple of the machine word, but on a multiple of two machine words. So, if you're considering just one error code, it's quite likely that any gains will be undone by the compiler padding the local variables on the stack to make their cumulative size a multiple of two machine words.
Second, the stack pointer is typically decremented by the size of the local and temporary variables just once at the entry of the function (and the reverse is done just once on exit). This means that in some places in the function there may be some unused stack space, which is reserved only to be used in other places of the function. So, calls (especially deep recursive calls) from some places of the function will be unduly wasting stack space.
Third, those four parameters that arrive in a0 through a3 are required by the ABI to have stack memory associated with them, so they can be stored there and addressed by pointers in functions like printf (recall stdarg.h's va_list, va_start(), va_arg(), etc). So, many calls may be wasting those 16 bytes of stack space as well.
Another thing you might want to consider is that when a function returns 8-bit or 16-bit integer types, the caller will need to sign-extend (or zero-extend) those 8/16 bits to the full machine word size (32 bits), meaning that the caller will have to use additional instructions like seb, seh and andi. So, these may affect code size negatively.
Ultimately, it depends a lot on your code and on your compiler. You can measure the stack usage using both types and using different optimization options of the compiler and choose the best. You can also experiment with restructuring your code to avoid calls or to make it easier for the compiler to optimize it (e.g. static functions help as the compiler may deviate from the ABI when calling them and more effectively optimize them and passing and returning values to and from them). And this is really what you should do, try different things and choose what you like the best.

Are programming languages and methods inefficient? (assembler and C knowledge needed)

for a long time, I am thinking and studying output of C language compiler in assembler form, as well as CPU architecture. I know this may be silly to you, but it seems to me that something is very ineffective. Please, don´t be angry if I am wrong, and there is some reason I do not see for all these principles. I will be very glad if you tell me why is it designed this way. I actually truly believe I am wrong, I know the genius minds of people which get PCs together knew a reason to do so. What exactly, do you ask? I´ll tell you right away, I use C as a example:
1: Stack local scope memory allocation:
So, typical local memory allocation uses stack. Just copy esp to ebp and than allocate all the memory via ebp. OK, I would understand this if you explicitly need allocate RAM by default stack values, but if I do understand it correctly, modern OS use paging as a translation layer between application and physical RAM, when address you desire is further translated before reaching actual RAM byte. So why don´t just say 0x00000000 is int a,0x00000004 is int b and so? And access them just by mov 0x00000000,#10? Because you wont actually access memory blocks 0x00000000 and 0x00000004 but those your OS set the paging tables to. Actually, since memory allocation by ebp and esp use indirect addressing, "my" way would be even faster.
2: Variable allocation duplicity:
When you run application, Loader load its code into RAM. When you create variable, or string, compiler generates code that pushes these values on the top o stack when created in main. So there is actual instruction for do so, and that actual number in memory. So, there are 2 entries of the same value in RAM. One in form of instruction, second in form of actual bytes in the RAM. But why? Why not to just when declaring variable count at which memory block it would be, than when used, just insert this memory location?
How would you implement recursive functions? What you are describing is equivalent to using global variables everywhere.
That's just one problem. How can you link to a precompiled object file and be sure it won't corrupt the memory of your procedures?
Because C (and most other languages) support recursion, so a function can call itself, and each call of the function needs separate copies of any local variables. Also, on most current processors, your way would actually be slower -- indirect addressing is so common that processors are optimized for it.
You seem to want the behavior of C (or at least that C allows) for string literals. There are good and bad points to this, such as the fact that even though you've defined a "variable", you can't actually modify its contents (without affecting other variables that are pointing at the same location).
The answers to your questions are mostly wrapped up in the different semantics of different storage classes
Google "data segment"
Think about the difference in behavior between global and local variables.
Think about how constant and non-constant variables have different requirements when functions are called repeatedly (or as Mehrdad says, recursively)
Think about the difference between static and non static automatic variables again in the context of multiple or recursive calls.
Since you are comparing assembler and c (which are very close together from an architectural standpoint), I'm inclined to say that you're describing micro-optimization, which is meaningless unless you profile the code to see if it performs better.
In general, programming languages are evolving towards a more declarative style (i.e. telling the computer what you want done, rather than how you want it done). When you program in an imperative language (like assembly or c), you specify in extreme detail how you want the problem solved. This gives the compiler little room to make optimization decisions on your behalf.
However, as the languages become more declarative, the compilers are getting smarter, because we are giving them the room they need to make more intelligent performance optimizations.
If every function would put its first variable at offset 0 and so on then you would have to change the memory mapping each time you enter a function (you could not allocate all variables to unique addresses if you want recursion). This is doable, but with current hardware it's very slow. Furthermore, the address translation performed by the virtual memory is not free either, it's actually quite complicated to implement this efficiently.
Addressing off ebp (or any other register) costs having a mux (to select the register) and an adder (to add the offset to the register). The time taken for this can often be overlapped with other operations.
If you want to be able to modify the static value you have to copy it to the stack. If you don't (saying it's 'const') then a good C compiler will no copy it to the stack.

Why compilers creates one variable "twice"?

I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.
Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.
So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.
What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)
Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.
not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10
Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.

Resources