Edit: I want to test the system by inserting a breakpoint and comparing memory before and after the breakpoint.
I used static analysis to get a list of C source code locations and debugging information (ie, a dwarf) provides a mapping between C source code and machine instructions in executable.
But the problem is that there are many machine instructions that mapped to one line of C source code and I need to test all of them.
The machine instruction to be tested is to modify the memory state.
So I want to reduce the number of instruction by eliminating the instruction that doesn't modify the memory.
For example, I have the following source code test.c and I have the line number 5.
2 int var1 = 10;
3 void foo() {
4 int *var2 = (int*)malloc(sizeof(int));
5 for(*var2=var1;;) {
6 /* ... */
7 }
8 }
To be clear, line number 5 accesses the global memory var1 and the heap memory *var2.
I compiled the above program with the command gcc -g test.c and the result is
(a.out)
00000000004004d6 <foo>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: 48 83 ec 10 sub $0x10,%rsp
4004de: bf 04 00 00 00 mov $0x4,%edi
4004e3: e8 d8 fe ff ff callq 4003c0 <malloc#plt>
4004e8: 48 89 45 f8 mov %rax,-0x8(%rbp)
4004ec: 8b 15 1e 04 20 00 mov 0x20041e(%rip),%edx # 600910 <var2>
4004f2: 48 8b 45 f8 mov -0x8(%rbp),%rax
4004f6: 89 10 mov %edx,(%rax)
4004f8: eb fe jmp 4004f8 <foo+0x22>
and dwarfdump -l a.out give me the following result.
0x004004d6 [ 3, 0] NS uri: "/home/workspace/test.c"
0x004004de [ 4, 0] NS
0x004004ec [ 5, 0] NS
0x004004f8 [ 5, 0] DI=0x1
Now I know that, in the a.out, the location 0x4004ec, 0x4004f2, 0x4004f6 and 0xf004f8 are mapped to the line number 5 in C source code.
But I want to exclude the 0x4004f8 (jmp) which doesn't access the (heap, global or local) memory.
Does anyone know how to get only instructions that access memory?
This is only answering the question about finding asm instructions with explicit memory operands. The part about associating them with C statements is pretty bogus outside of -O0 compiler output (where each statement is compiled to a separate block of instructions to support GDB's jump to another line in the same function, or modifying variables in memory while stopped at breakpoint). See Basile's answer which tries to make some sense of the C statement stuff in the question.
Intel-syntax disassembly might be handy, because all explicit memory operands will have ptr in them, like mov rax, qword ptr [rbp - 0x8], so you can text search.
In asm source, the <size> ptr syntax isn't required when a register operand implies the operand size, but disassemblers like objdump -drwC -Mintel always put it in.
In AT&T syntax, you could also just look for () or a bare symbol name as an operand.
Don't forget to filter out lea instructions. lea is like the & operator in C. It's a shift-and-add instruction that uses memory-operand syntax and machine encoding.
Also don't forget to filter out various long-nop instructions that use addressing modes to get the right amount of padding in one instruction. For example:
66 2e 0f 1f 84 00 00 00 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
So if the mnemonic is lea or nop, ignore the instruction. (32-bit code sometimes uses other instructions as NOPs, but usually it's actually an lea that sets a register to itself in machine code generated by gas / ld from compiler .p2align directives.)
objdump disassembles rep stos with explicit operands, like rep stos QWORD PTR es:[rdi],rax. So you will actually get rep movs and rep stos operands. (Note that rep movs and rep cmps have two memory operands, unlike normal instructions. They're implicit in the machine code, but objdump makes them explicit.) This will also miss implicit memory operands like the stack for push / pop and call / ret.
A given C statement is compiled into several machine instructions, and several of them may access memory. Think of something like ptr->fld = arr[i++] * arr[j]--; .... BTW, in some cases, arr[j] might have been used earlier, could already sit in some register, so might not need another memory load (but only a store, which could be defered later).
I want to know the location, in executable, of the machine instruction that accesses (heap, global or local) memory generated by the given code
So your question might not make sense in general. Several machine instructions (or none of them) might access memory (related to a single C statement in your source code). And register allocation and register spilling may happen, so a given machine instruction might be related to a C variable quite far from the "current" C instruction (which has no sense).
An optimizing compiler is allowed to mix the several C statements and might output intermixed machine code. Read also about sequence points. There is no obvious mapping between machine code instruction and C statement (notably with optimizations enabled), that is why you often debug with less optimizations enabled (so gcc -g prefers to be used with -O0 or -Og, not more).
With GCC compile your src.c source file using
gcc -O -S -Wall -fverbose-asm src.c
and you'll get a slightly more readable src.s assembler file. You could use some editor or pager to look into that generated file.
Does anyone know how to get only instructions that access memory?
That does not make much sense. An optimizing compiler would sometimes share some common machine code related to several different C statements.
BTW, you might also ask GCC to dump various internal representations, for example using gcc -O -fdump-tree-all ; then you get hundreds of (textual) internal dump files (partially dumping various internal representations). Remember that GCC has hundreds of optimization passes.
Notice you might be more interested to work on GCC internal representations (e.g. GENERIC or GIMPLE or even RTL) by adding your own GCC plugin (or GCC MELT extensions). That could require months of work (notably to undestand details of GCC internal architecture and representations).
Without understanding your high-level goals and motivations, we cannot help you more.
You should read much more about semantics and about undefined behavior, which is (indirectly) more relevant to your question than what you believe.
Notice that C statements do not correspond (one to many) to machine instructions. An optimizing compiler don't compile C statements one by one, it compiles an entire translation unit at once (and may for example do inline expansions, loop unrolling, stack unwinding, constant folding, register allocation and spilling, interprocedural optimizations and dead code elimination). This is why C compilers are so complex beasts of many millions of source code lines. BTW, most C compilers (e.g. GCC or Clang) are free software, so you can spend several months or years studying their source code.
Read also some good book on compilers (e.g. the latest Dragon Book), some books on semantics, and on programming languages pragmatics.
If you are interested by GCC internals specifically, my documentation page (also available here) of GCC MELT contains lots of slides and references.
If you only care about machine instructions, you might entirely forget about C and work, with the help of some dissassembler library like libopcode (see this), only on machine code in object files.
Look also into other static source code analyers, including Coccinelle & Frama-C and libclang.
If you are interested only by GCC emitted code and can afford recompiling your C source code, you might instead work inside the GCC compiler (thru your GCC plugin or GCC MELT extension) at the GIMPLE level and detect (and perhaps transform) those GIMPLE instructions accessing memory. Detecting (and perhaps transforming) GIMPLE statements modifying memory could be simpler and might be enough.
I want to test the system by inserting a breakpoint and comparing memory before and after the breakpoint.
This is a bit similar to e.g. address sanitizers and other instrumentation features of GCC. You could spend several years working on something similar (and transforming some GIMPLE), then you probably want to add several additional passes in GCC (and you might need some extra runtime support).
Notice however that recent GDB is scriptable (in Guile or Python) and has watchpoints. If you just want to debug one particular program, that might be enough (and you might not need to dive into compiler internals, which would take many months or years of work). You should also use valgrind and address sanitizers.
Related
Why can you find jmp esp only in big applications?
In this little program you cant find jmp esp. But why?
This is the source code:
#include <stdio.h>
int main(int argc, char **argv)
{
char buffer[64];
printf("Type in something: ");
gets(buffer);
return 0;
}
AT&T jmp *%esp / Intel jmp esp has machine code ff e4. You should be looking for that byte sequence at any offset.
(I assembled a .s with that instruction and used objdump -d to get the machine code.)
There is a lot of discussion in comments from people who thought you were talking about
jmp *(%esp) as a ret without pop. For future readers, see Why JMP ESP instead of directly jumping into the stack on security.SE for more about this ret2reg technique to defeat stack ASLR when trying to return to your executable payload. (But not defeating non-executable stacks, so this is rarely useful on its own in modern systems.) It's a special case of a ROP gadget.
Compilers are never going to use that instruction intentionally, so you'll only ever find it as part of the bytes for another instruction, or in a non-code section. Or not at all if no data happens to include it.
Also, your search method could miss it if it did occur.
objdump | grep 'jmp.*esp' is not good here. That will miss ff e4 as part of mov eax, 0x1234e4ff for example. And disassembly of data sections similarly will only "check" bytes where objdump decides that an instruction starts. (It doesn't do overlapping disassembly starting from every possible byte address; it gets to the end of one instruction and assumes the next instruction starts there.)
But even so, I compiled your code with gcc8.2 with optimization disabled (gcc -m32 foo.c) and searched for e4 bytes in the output of hexdump -C. None of them were preceded by an ff byte. (I tried again with gcc -m32 -no-pie -fno-pie foo.c, still no ff e4)
There's no reason to expect that to appear in a tiny executable.
You could introduce one with a global const unsigned char jmp_esp[] = { 0xff, 0xe4 };
But note that modern toolchains (like late 2018 / 2019) put even the .rodata section in a non-executable segment. So you'd need to compile with -zexecstack for byte sequences in non-code sections to be useful as gadgets.
But you probably need -z execstack or something else to make the stack itself executable, for your payload itself to be in an executable page, not just a jmp esp in a const array.
If you disabled library ASLR, then you could use an ff e4 at a known address somewhere in libc. But with normal randomization of library mapping addresses, it's probably just as easy to try to guess the stack address of your buffer directly, +- some bytes you fill with a NOP slide. (Unless you can get the program you're attacking to leak a library address, defeating ASLR).
In C programming language, a variable can have a memory address and a value.
And as I understood every function as well have an address and also data which allocated at that address. My question is what is the meaning of the data which these functions point to?
You already got (good) answers, but I think some (obscure?) fact about C should be pointed out, regarding your question:
In C programming language, a variable can have a memory address and a value.
Actually the defining property of a variable is that is always has a value – if it's uninitialized, semantically it still has a value, only that this value is the "undefined value" and reading the "undefined value" invokes undefined behaviour.
But, and this is important, not every variable in C does have an address! There is this little storage classifier register, which exact meaning most people do not fully comprehend. The most widespread – and wrong – interpretation is, that register means that the variable is to be placed in registers only. The problem is: There are instruction architectures in which registers do not exist, but C has been designed to be still viable for them.
The true meaning of the register classifier is, that you can not take the address of a variable that is register, which means you can not create pointers toward it.
The upshot of this is, that a variable that is register the only important thing is its value. And it is perfectly legal for the C compiler to generate code that completely discards the "place" (be it register, memory location or something entirely different) where its value came to be, as long as it able to faithfully recreate the value in a way, that it is semantically conforming to the program text. This also implies that it is perfectly legal to perform a whole re-computation of whatever had to be executed to obtain the final value. Which is why applying the register storage qualifier to variable may result in sudden increase of code size and drop of performance.
As such the register storage qualifier is not a mechanism for optimizing code, but should be treated as a special purpose tool for writing code that's neither time nor size critical but has to operate under very specific, tight constraints. One example would be for example bootloaders or system initialization code, which task it is to initialize memory access in the first place and have to operate with just a few bytes – or even none – of usable memory storage, but can re-compute values required for each step.
The C programming language is (like every programming language) a specification (in some report). It is not a software. You probably should read the n1570 (draft specification of C11) report.
Conceptually, a function does not have any data in C (but its code may refer to static addresses, contain literal constants - including pointers- etc...). It has some behavior, practically implemented by some code. What is code is not defined by the C standard.
Practically speaking, and this depends upon the particular implementation (look into the difference between Harvard machine & computer architectures and Von Neumann ones), a function pointer is some address of machine code (often, the target of the CALL machine instruction translating the C calls to it).
On desktops & laptops & tablets with some usual operating system (like Linux, Windows, MacOSX, iOS, Android...) -all are Von Neumann architectures: x86-64 or ARM-, your process has a single virtual address space containing code segments and data segments and heap data. Then function pointers and data pointers are of the same kind, and it is practically meaningful to cast between them. A canonical example is the usage of POSIX dlsym: you often cast its result to some function pointer (e.g. inside some plugin which is dynamically loaded with dlopen). The address of a function is practically speaking the address of its first machine code instruction (sitting in some code segment in the common address space). Read this & that for creative examples. Another useful example is JIT compilation libraries like asmjit, GNU lightning, libgccjit, LLVM: they enable you to generate machine code at runtime, and to get a (fresh) function pointer from these.
Neither dlsym nor JIT libraries are stricto sensu conforming to the C standard, because in a purely standard conforming C program the set of functions is statically known and any function pointer should point to some existing function of the same signature (read about calling conventions & ABIs), otherwise it is undefined behavior.
On some embedded computers with a Harvard architecture (e.g. some Arduino), code and data sit in different spaces, and a code address might not have the same number of bits than a data address. On such systems, a cast between function and data pointers is meaningless (unless you dive into deep implementation details). The C standard was specified to be general enough to take such weird computers into account.
Read also a lot more about closures and continuations. The C standard don't have them (hence callbacks conventionally take some client data argument). You probably will learn a lot by reading SICP. Read also about homoiconicity.
Read also about Operating Systems: If you use Linux (which I recommend, because it is mostly made of free software whose source code you can study), read Advanced Linux Programming. Read also Operating Systems: Three Easy Pieces.
In other words: your question (on function pointers and addresses) has different approaches. A dogmatic programming language lawyer approach (and the issue is to understand deeply the semantics of function pointers in the C standards; look also into CompCert & Frama-C); a pragmatic operating system and implementation specific approach (and then it depends upon your computer, its instruction set, and its OS and even your particular C compiler -and version- and optimization flags; and you may even have some "magic mechanisms" -like dlsym & dlopen or JIT compilation libraries- to create functions at runtime; which is magic because the C standards don't think of that).
You can find your answer here.
The C language supports two kinds of memory allocation through the variables in C programs:
Static allocation: is what happens when you declare a static or global variable. Each static or global variable defines one block of space, of a fixed size. The space is allocated once, when your program is started (part of the exec operation), and is never freed.
Automatic allocation: happens when you declare an automatic variable, such as a function argument or a local variable. The space for an automatic variable is allocated when the compound statement containing the declaration is entered, and is freed when that compound statement is exited.
In GNU C, the size of the automatic storage can be an expression that varies. In other C implementations, it must be a constant.
Function pointers point to blocks of machine instructions that get executed when you call the function.
Say you have this:
#include <stdio.h>
int plus_42(int x)
{
int res=x+42;
printf("%d + 42 = %d\n", x,res);
return res;
}
int main()
{
return plus_42(1);
}
If you compile it, link it, and run objdump -d on the result:
gcc plus_42.c && objdump -d a.out
you'll get (depending on your architecture, something like):
0000000000400536 <plus_42>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 20 sub $0x20,%rsp
40053e: 89 7d ec mov %edi,-0x14(%rbp)
400541: 8b 45 ec mov -0x14(%rbp),%eax
400544: 83 c0 2a add $0x2a,%eax
400547: 89 45 fc mov %eax,-0x4(%rbp)
40054a: 8b 55 fc mov -0x4(%rbp),%edx
40054d: 8b 45 ec mov -0x14(%rbp),%eax
400550: 89 c6 mov %eax,%esi
400552: bf 04 06 40 00 mov $0x400604,%edi
400557: b8 00 00 00 00 mov $0x0,%eax
40055c: e8 af fe ff ff callq 400410 <printf#plt>
400561: 8b 45 fc mov -0x4(%rbp),%eax
400564: c9 leaveq
400565: c3 retq
0000000000400566 <main>:
400566: 55 push %rbp
400567: 48 89 e5 mov %rsp,%rbp
40056a: bf 01 00 00 00 mov $0x1,%edi
40056f: e8 c2 ff ff ff callq 400536 <plus_42>
400574: 5d pop %rbp
400575: c3 retq
400576: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40057d: 00 00 00
plus some boilerplate.
Here, 0000000000400536 and 0000000000400566 are the addresses of main and plus_42 (= the pointers that main and plus_42 point to) respectively, and the hex numbers you see in the 2nd column is the data, which is decoded in the 3d column into human readable names of the machine instructions that the data represents.
If I compile this program:
#include <stdio.h>
int main(int argc, char** argv) {
printf("hello world!\n");
return 0;
}
for x86-64, the asm output uses movl $.LC0, %edi / call puts. (See full asm output / compile options on godbolt.)
My question is: How can GCC know that the the string's address can fit in a 32bit immediate operand? Why doesn't it need to use movabs $.LC0, %rdi (i.e. a mov r64, imm64, not a zero or sign-extended imm32).
AFAIK, there's nothing saying the loader has to decide to load the data section at any particular address. If the string is stored at some address above 1ULL << 32 then the higher bits will be ignored by the movl. I get similar behavior with clang, so I don't think this is unique to GCC.
The reason I care is I want to create my own data segment that lives in memory at any arbitrary address I choose (above 2^32 potentially).
In GCC manual:
https://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/i386-and-x86_002d64-Options.html
3.17.15 Intel 386 and AMD x86-64 Options
-mcmodel=small
Generate code for the small code model: the program and its symbols
must be linked in the lower 2 GB of the address space. Pointers are 64
bits. Programs can be statically or dynamically linked. This is the
default code model.
-mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be
used for Linux kernel code.
-mcmodel=medium
Generate code for the medium model: The program is linked in the lower
2 GB of the address space. Small symbols are also placed there.
Symbols with sizes larger than -mlarge-data-threshold are put into
large data or bss sections and can be located above 2GB. Programs can
be statically or dynamically linked.
-mcmodel=large
Generate code for the large model: This model makes no assumptions
about addresses and sizes of sections.
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
3.18.1 AArch64 Options
-mcmodel=tiny
Generate code for the tiny code model. The program and its statically defined symbols must be within 1GB of each other. Pointers
are 64 bits. Programs can be statically or dynamically linked. This
model is not fully implemented and mostly treated as ‘small’.
-mcmodel=small
Generate code for the small code model. The program and its statically defined symbols must be within 4GB of each other. Pointers
are 64 bits. Programs can be statically or dynamically linked. This is
the default code model.
-mcmodel=large
Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Pointers are 64 bits. Programs
can be statically linked only.
I can confirm that this happens on 64-bit compilation:
gcc -O1 foo.c
Then objdump -d a.out (notice also that printf("%s\n") can be optimized into puts!):
0000000000400536 <main>:
400536: 48 83 ec 08 sub $0x8,%rsp
40053a: bf d4 05 40 00 mov $0x4005d4,%edi
40053f: e8 cc fe ff ff callq 400410 <puts#plt>
400544: b8 00 00 00 00 mov $0x0,%eax
400549: 48 83 c4 08 add $0x8,%rsp
40054d: c3 retq
40054e: 66 90 xchg %ax,%ax
The reason is that GCC defaults to -mcmodel=small where the static data is linked in the bottom 2G of address space.
Notice that string constants do not go to the data segment, but they're within the code segment instead, unless -fwritable-strings. Also if you want to relocate the object code freely in memory, you'd probably want to compile with -fpic to make the code RIP relative instead of putting 64-bit addresses everywhere.
void demo()
{
printf("demo");
}
int main()
{
printf("%p",(void*)demo);
return 0;
}
The above code prints the address of function demo.
So if we can print the address of a function, that means that this function is present in the memory and is occupying some space in it.
So how much space it is occupying in the memory?
You can see for yourself using objdump -r -d:
0000000000000000 <demo>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
5: R_X86_64_32 .rodata
9: b8 00 00 00 00 mov $0x0,%eax
e: e8 00 00 00 00 callq 13 <demo+0x13>
f: R_X86_64_PC32 printf-0x4
13: 5d pop %rbp
14: c3 retq
0000000000000015 <main>:
EDIT
I took your code and compiled (but not linked!) it. Using objdump you can see the actual way the compiler lays out the code to be run. At the end of the day there is no such thing as a function: for the CPU it's just a jump to some location (that in this listing happens to be labeled). So the size of the "function" is the size of the code that comprises it.
There seems to be some confusion that this is somehow not "real code". Here is what GDB says:
Dump of assembler code for function demo:
0x000000000040052d <+0>: push %rbp
0x000000000040052e <+1>: mov %rsp,%rbp
0x0000000000400531 <+4>: mov $0x400614,%edi
0x0000000000400536 <+9>: mov $0x0,%eax
0x000000000040053b <+14>: callq 0x400410 <printf#plt>
0x0000000000400540 <+19>: pop %rbp
0x0000000000400541 <+20>: retq
This is exactly the same code, with exactly the same size, patched by the linker to use real addresses. gdb prints offsets in decimal while objdump uses the more favourable hex. As you can see, in both cases the size is 21 bytes.
So if we can print the address of a function, that means that this
function is present in the memory and is occupying some space in it.
Yes, the functions you write are compiled into code that's stored in memory. (In the case of an interpreted language, the code itself is kept in memory and executed by an interpreter.)
So how much space it is occupying in the memory?
The amount of memory depends entirely on the function. You can write a very long function or a very short one. The long one will require more memory. Space used for code generally isn't something you need to worry about, though, unless you're working in an environment with severe memory constraints, such as on a very small embedded system. On desktop computer (or even mobile device) with a modern operating system, the virtual memory system will take care of moving pages of code into or out of physical memory as they're needed, so there's very little chance that your code will consume too much memory.
Of course it's occupying space in memory, the entire program is loaded in memory once you execute it. Typically, the program instructions are stored in the lowest bytes of the memory space, known as the text section. You can read more about that here: http://www.geeksforgeeks.org/memory-layout-of-c-program/
Yes, all functions that you use in your code do occupy memory space. However, the memory space does not necessarily belong exclusively to your function. For example, an inline function would occupy space inside each function from where it is called.
The standard does not provide a way to tell how much space a function occupies in memory, as pointer arithmetic, the trick that lets you compute sizes of contiguous memory regions in the data memory, is not defined for function pointers. Moreover, ISO C forbids conversion of function pointer to object pointer type, so you cannot get around this restriction by casting your function pointer to, say, a char*.
printf("%p",demo);
The above code prints the address of function demo().
That is undefined behavior: %p expects a void*, while you are passing it a void (*)(). You should see a compiler warning, telling that what you are doing is not valid (demo).
As for determining the amount of memory it is occupying, this is not possible at run-time. However, there are other ways you can determine it:
How to get the length of a function in bytes?
The functions are compiled into machine code that will run only on a specific ISA (x86, probably ARM if it's going to run on your phone, etc.) Since different processors may need more or fewer instructions to run the same function, and the length of instructions can also vary, there is no way to know in advance exactly how big the function will be until you compile it.
Even if you know what processor and operating system it will be compiled for, different compilers will create different, equivalent representations of the function depending on which instructions they use and how they optimize the code.
Also, keep in mind a function occupies memory in different ways. I think you are talking about the code itself, which is its own section. During execution, the function can also occupy space on the stack - every time the function is called, more memory is taken up in the form of a stack frame. The amount depends on the number and type of local variables and arguments declared by the function.
Yes however you can declare it as being inline, so the compiler will take the source code and move it where ever you call that function. Or you can also use preprocessor macros. Though do keep in mind using inline will generate larger code but it will execute faster, and the compiler can decide to ignore your inline request if it feels that it will become to large.
#include <stdio.h>
int main()
{
int i = 10;
return 0;
}
In the above program, where exactly the value 10 is stored ?
I understand the variable i is stored in the stack. stack is populated during run time. From "where exactly" 10 is coming from.
10 is a constant, so the compiler will use the number 10 directly in the executable part of your program as part of the CPU instructions.
Here's the assembly produced on my system with gcc:
movl $10, -4(%rbp)
(The 4 is because an int is 4 bytes long)
Note that all of these things are part of the implementation, but the above happens in practice. The language itself doesn't specify these details.
10 is a "literal", which will be generated by the compiler during compilation. And then it will be assigned to your variable on the stack. Like
mov eax, 10;
mov [0x12345678], eax;
This is pseude-code, though, but will assign your variable i (address is here 0x12345678) the value 10, previously stored in eax.
The "stack" is an area of the memory set aside by the operating system for your program, separate from the "heap" and the global variables and the executable code.
When a function is called, there is code to push the arguments onto the stack, then space is set aside for the local variables. When the function returns this space and all arguments are "popped" off the stack, so the memory can be reused by the next function.
This is a very basic and rough description, and a lot of the details differs between systems and compilers.
There will be an explicit machine code instruction that sets i.
Something like:
MOV AL, 10
After compiling and linking your executable contains multiple segments. Two type of these segments are:
text segments - containing the actual code
data segments - containing static data.
(there are other types as well)
The value 10 is either stored in the text segment (as an instruction to set 10 to a specific address or register), or stored as data in the data segment (which is retrieved by the code and stored at the specific address/register).
The compiler decides what is best (most efficient for the given compilation flags). But I suppose it is 'stored' in a text segment as the value 10 is quite simple to 'create in code' (as shown by some of the other answers).
More complex data (structs, strings, etc.) are typically stored in a data segment.
The value 10 is stored in a physical source file which contains your source code. During execution, that value is transferred to a variable named i, which has automatic storage duration, and is of the type int. That is all that matters. Any further questions aren't productive in the realm of general purpose programming languages such as C.
Notice how most answers mention compilers? What if an interpreter is used to translate C source code directly into behaviour? Are these answers still valid?
I rather not "give a man a fish" if you will... You can actually check it out yourself:
Take your code and create an object file out of it:
> gcc -c file.c -o file.o
Then do an object dump on the generated file:
> objdump -d file.o
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: c7 45 fc 0f 00 00 00 movl $0xa,-0x4(%rbp) // Right here you can see
// the raw value 0xa (10)
// being set, so it's in the
// .text section