Where are static local variables stored in memory? Local variables can be accessed only inside the function in which they are declared.
Global static variables go into the .data segment.
If both the name of the static global and static local variable are same, how does the compiler distinguish them?
Static variables go into the same segment as global variables. The only thing that's different between the two is that the compiler "hides" all static variables from the linker: only the names of extern (global) variables get exposed. That is how compilers allow static variables with the same name to exist in different translation units. Names of static variables remain known during the compilation phase, but then their data is placed into the .data segment anonymously.
Static variable is almost similar to global variable and hence the uninitialized static variable is in BSS and the initialized static variable is in data segment.
As mentioned by dasblinken, GCC 4.8 puts local statics on the same place as globals.
More precisely:
static int i = 0 goes on .bss
static int i = 1 goes on .data
Let's analyze one Linux x86-64 ELF example to see it ourselves:
#include <stdio.h>
int f() {
static int i = 1;
i++;
return i;
}
int main() {
printf("%d\n", f());
printf("%d\n", f());
return 0;
}
To reach conclusions, we need to understand the relocation information. If you've never touched that, consider reading this post first.
Compile it:
gcc -ggdb -c main.c
Decompile the code with:
objdump -S main.o
f contains:
int f() {
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
static int i = 1;
i++;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
a: 83 c0 01 add $0x1,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <f+0x13>
return i;
13: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 19 <f+0x19>
}
19: 5d pop %rbp
1a: c3 retq
Which does 3 accesses to i:
4 moves to the eax to prepare for the increment
d moves the incremented value back to memory
13 moves i to the eax for the return value. It is obviously unnecessary since eax already contains it, and -O3 is able to remove that.
So let's focus just on 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
Let's look at the relocation data:
readelf -r main.o
which says how the text section addresses will be modified by the linker when it is making the executable.
It contains:
Relocation section '.rela.text' at offset 0x660 contains 9 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 000300000002 R_X86_64_PC32 0000000000000000 .data - 4
We look at .rela.text and not the others because we are interested in relocations of .text.
Offset 6 falls right into the instruction that starts at byte 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
^^
This is offset 6
From our knowledge of x86-64 instruction encoding:
8b 05 is the mov part
00 00 00 00 is the address part, which starts at byte 6
AMD64 System V ABI Update tells us that R_X86_64_PC32 acts on 4 bytes (00 00 00 00) and calculates the address as:
S + A - P
which means:
S: the segment pointed to: .data
A: the Added: -4
P: the address of byte 6 when loaded
-P is needed because GCC used RIP relative addressing, so we must discount the position in .text
-4 is needed because RIP points to the following instruction at byte 0xA but P is byte 0x6, so we need to discount 4.
Conclusion: after linking it will point to the first byte of the .data segment.
Related
I am learning about linking and found a small question that I could not understand.
Consider the following files:
main.c
#include "other.h"
extern int i;
int main() {
++i;
inci();
return 0;
}
other.c
int i = 0;
void inci() {
++i;
}
Then I compile these two files:
gcc -c main.c
gcc -shared -fpic other.c -o libother.so
gcc -o main main.o ./libother.so
Here is part of the dissasemble of main.o:
f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 15 <main+0x15>
15: 83 c0 01 add $0x1,%eax
18: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 1e <main+0x1e>
1e: b8 00 00 00 00 mov $0x0,%eax
23: e8 00 00 00 00 call 28 <main+0x28>
Here is part of the disassemble of main:
1148: 8b 05 ca 2e 00 00 mov 0x2eca(%rip),%eax # 4018 <i##Base>
114e: 83 c0 01 add $0x1,%eax
1151: 89 05 c1 2e 00 00 mov %eax,0x2ec1(%rip) # 4018 <i##Base>
1157: b8 00 00 00 00 mov $0x0,%eax
115c: e8 cf fe ff ff call 1030 <inci#plt>
They both correspond to C code:
++i;
According to the assembly, it seems that the linker has already decided the run-time address of i, because it is using a PC-relative address to reference it directly, rather than using GOT. However, as far as I know, the shared library is only loaded into memory when the program uses it loads. Thus, the executable main should have no knowledge about the address of i at link time. Then, how does the linker determine that i is located at 0x4020?
Also what does the comment i##Base mean?
According to the assembly, it seems that the linker has already decided the run-time address of i, because it is using a PC-relative address to reference it directly, rather than using GOT.
Correct.
However, as far as I know, the shared library is only loaded into memory when the program uses it loads.
Correct, except the i variable in the shared library is never used, and so its address doesn't matter.
What happens here is described pretty well in Solaris documentation:
Suppose the link-editor is used to create a dynamic executable, and a reference to a data item is found to reside in one of the dependent shared objects. Space is allocated in the dynamic executable's .bss, equivalent in size to the data item found in the shared object. This space is also assigned the same symbolic name as defined in the shared object. Along with this data allocation, the link-editor generates a special copy relocation record that instructs the runtime linker to copy the data from the shared object to the allocated space within the dynamic executable.
Because the symbol assigned to this space is global, it is used to satisfy any references from any shared objects. The dynamic executable inherits the data item. Any other objects within the process that make reference to this item are bound to this copy. The original data from which the copy is made effectively becomes unused.
You can observe this using readelf -Ws main:
Symbol table '.dynsym' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
...
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND inci
4: 0000000000404024 4 OBJECT GLOBAL DEFAULT 25 i
Note that the inci() is undefined (it's defined in libother.so), but i is defined in the main as a global symbol, and readelf -Wr main:
Relocation section '.rela.dyn' at offset 0x4d8 contains 3 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
...
0000000000404024 0000000400000005 R_X86_64_COPY 0000000000404024 i + 0
Relocation section '.rela.plt' at offset 0x520 contains 1 entry:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000404018 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 inci + 0
I am trying to understand the relocations for the object file generated by this simple program:
int answer = 42;
int compute()
{
return answer;
}
int main()
{
return compute();
}
I compile it with simply gcc -c main.cpp.
Then, examining objdump -DCrw -M intel main.o, we see among other things:
Disassembly of section .text:
0000000000000000 <compute()>:
0: f3 0f 1e fa endbr64
4: 55 push rbp
5: 48 89 e5 mov rbp,rsp
8: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0] # e <compute()+0xe> a: R_X86_64_PC32 answer-0x4
e: 5d pop rbp
f: c3 ret
(...)
Disassembly of section .data:
0000000000000000 <answer>:
0: 2a 00 sub al,BYTE PTR [rax]
...
We can also look at the relocations as reported by readelf -rW main.o:
Relocation section '.rela.text' at offset 0x250 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
000000000000000a 0000000900000002 R_X86_64_PC32 0000000000000000 answer - 4
(...)
Let's look at the relocation for the use of answer in compute():
To return answer from the function, we have this instruction to put the value of answer into eax:
8: 8b 05 00 00 00 00 mov eax,DWORD PTR [rip+0x0]
I.e. "copy whatever is at rip + an offset we don't know yet so let's just put 0s for it for now into eax".
The relocation for that instruction is
Offset Info Type Symbol's Value Symbol's Name + Addend
000000000000000a 0000000900000002 R_X86_64_PC32 0000000000000000 answer - 4
The offset value 0xa means that there's a relocation to be done at 0xa, or the third byte of the mov instruction above which starts on 0x8. This is the placeholder 00 00 00 00 we saw above. The relocation type is R_X86_64_PC32, which according to the System V Application Binary Interface means "S+A-P", where:
S = the value of the symbol whose index resides in the relocation entry
A = the addend used to compute the value of the relocatable field
P = the place (section offset or address) of the storage unit being relocated (computed using r_offset).
The value of the symbol (the value at 0xa) is 0. A is the addend -4 on that line.
I am not certain what P is, is it the address of the .data section?
And where does the -4 come from?
This calculation seems to end up as "4 bytes less the address of the .data section". How does that represent the distance from RIP (0xe) to wherever answer ends up in the final executable?
If I look at the final executable though (gcc -o main main.o, objdump -DCrw -M intel main), I see our mov instruction has been relocated:
1131: 8b 05 d9 2e 00 00 mov eax,DWORD PTR [rip+0x2ed9]
The 00 00 00 00 placeholder has been replaced with d9 2e 00 00. rip is now at 0x1137, so rip+0x2ed9 is 0x4010. And sure enough, at 0x4010 we find the answer:
Disassembly of section .data:
(...)
0000000000004010 <answer>:
4010: 2a 00 sub al,BYTE PTR [rax]
So everything works out fine in the end, I just don't understand exactly how it happens.
i am write a mini os. And when i write this code to show time clock, its goes wrong
7 void timer_callback(pt_regs *regs)
8 {
9 static uint32_t tick = 0;
10 printf("Tick: %dtimes\n", tick);
11 tick++;
12 }
tick is initialise not with 0, but 1818389861. but if tick init with 0x01 or anything else zero, it's ok!!!
so i wirte a simple c file then objdump:
staic.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
extern void printf(char *, int);
int main(){
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 04 sub $0x4,%esp
static int a = 1;
printf("%d\n", a);
11: a1 00 00 00 00 mov 0x0,%eax
16: 83 ec 08 sub $0x8,%esp
19: 50 push %eax
1a: 68 00 00 00 00 push $0x0
1f: e8 fc ff ff ff call 20 <main+0x20>
24: 83 c4 10 add $0x10,%esp
return 0;
27: b8 00 00 00 00 mov $0x0,%eax
}
2c: 8b 4d fc mov -0x4(%ebp),%ecx
2f: c9 leave
30: 8d 61 fc lea -0x4(%ecx),%esp
33: c3 ret
so strange, no memory used!!!
Update: let me say it clearly
the second static.c is an experiment, it was thought it show no memory used, but i was wrong, mov 0x0 %eab is. i confuse 0x0 and $0x0 /..\
my origin problem is why tick not succeed init with 0.(but can init with 1 or anyelsenumber).
i look up it again use gdb, ok, it do use memory like mov
eax,ds:0x106010,but the real strong thing is the memory x 0x106010 is not 0,but it should be, just as i said, if i let tick = 1 or anythingelse, memory do init as i want, that is the strange thing!
the tool: gdb ,objdump return different asm(different means,not formate),because, just learn os,not good at c, so i let it go,ignore it....
Memory is used, be sure of that; however, you won't find that memory in the .text section. Memory for static variables is allocated in either .bss (when zero-initialized; or, in case of C++, dynamically initialized) or .data (when non-zero initialized) section.
When dumping object files with objdump using the -d (disassembly) option, it is important to also use the -r (relocations) option. Without that, the disassembly you get is deceiving and makes little sense.
In your case, the instruction at addresses 11 and 1f must have relocations, at address 11, to the variable a and at address 1f, to the function printf. The instruction at address 11 loads the value from your variable a, without proper relocations it looks as if it loaded a value from address 0.
As to your original question, the value you get, 1818389861, or 0x6C626D65, is quite remarkable. I would bet that somewhere in your program you have a buffer overrun involving a string containing the subsequence embl.
As a side note, I would like to call your attention to the use of correct type specifications in printf calls. The type specification %d corresponds to the type int; on all modern mainstream architectures, int and int32_t are of the same size. However, that is not guaranteed to always be so. There are special type specifications for use with explicitly-sized types, for example, for an int32_t you use "PRId32":
uint32_t x;
printf("%"PRId32, x);
For the following code:
#include <stdio.h>
int main() {
printf("Hello World");
printf("Hello World1");
return 0;
}
the generated assembly for calling printf is as follows (64 bits):
400474: be 24 06 40 00 mov esi,0x400624
400479: bf 01 00 00 00 mov edi,0x1
40047e: 31 c0 xor eax,eax
400480: e8 db ff ff ff call 400460 <__printf_chk#plt>
400485: be 30 06 40 00 mov esi,0x400630
40048a: bf 01 00 00 00 mov edi,0x1
40048f: 31 c0 xor eax,eax
400491: e8 ca ff ff ff call 400460 <__printf_chk#plt>
And the .rodata section is as follows:
Contents of section .rodata:
400620 01000200 48656c6c 6f20576f 726c6400 ....Hello World.
400630 48656c6c 6f20576f 726c6431 00 Hello World1.
Based on the assembly code, the first call for printf has the argument with address 400624 which has a 4 byte offset from the start of .rodata. I know it skips the first 4 bytes for these 4 dots prefix here. But my question is why GCC/linker produce this prefix for string in .rodata ? I am using 4.8.4 GCC on Ubuntu 14.04. The compilation cmd is just: gcc -Ofast my-source.c -o my-program.
For starters, those are not four dots, the dot just means unprintable character. You can see in the hex dump that those bytes are 01 00 02 00.
The final program contains other object files added by the linker, which are part of the C runtime library. This data is used by code there.
You can see the address is 0x400620. You can then try to find a matching symbol, for example you can load it into gdb and use the info symbol command:
(gdb) info symbol 0x4005f8
_IO_stdin_used in section .rodata of /tmp/a.out
(Note I had a different address.)
Taking it further, you can actually find the source for this in glibc:
/* This records which stdio is linked against in the application. */
const int _IO_stdin_used = _G_IO_IO_FILE_VERSION;
and
#define _G_IO_IO_FILE_VERSION 0x20001
Which corresponds to the value you see if you account for little-endian storage.
It does not prefix the data. The .rodata can contain anything. The first four bytes are [seemingly] a string, but it just happens to link there (i.e. it's for something else). It is unrelated to your "Hello World"
I am using GCC 4.8.1 and it doesn't seem to store const variable local to main in DATA segment. Below is code and memory map for 3 such programs:
Code 1:
int main(void)
{ //char a[10]="HELLO"; //1 //const char a[10] = "HELLO"; //2
return 0;
}
MEMORY MAP FOR ABOVE:
text data bss dec hex filename
7264 1688 1040 9992 2708 a.exe
CODE 2:
int main(void)
{
char a[10]="HELLO";
//const char a[10] = "HELLO";
return 0;
}
MEMORY MAP FOR 2:
text data bss dec hex filename
7280 1688 1040 10008 2718 a.exe
CODE 3:
int main(void)
{
//char a[10]="HELLO";
const char a[10] = "HELLO";
return 0;
}
MEMORY MAP FOR 3 :
text data bss dec hex filename
7280 1688 1040 10008 2718 a.exe
I do not see any difference in data segment between 3 codes. Can someone please explain this result to me.
Thanks in anticipation!
If your array is not used by your program so the compiler is allowed to simply optimize out the object.
From the C Standard:
(C99, 5.1.2.3p1) "The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant"
and
(C99, 5.1.2.3p3) "In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."
If you compile your program #3 with optimizations disabled (-O0) or depending on your compiler the object can still be allocated. In your case it does not appear in the data and rodata section but in text section thus the text section increase.
For example in your third example, in my compiler the resulting code with -O0 is (dumped using objdump -d):
00000000004004d0 <main>:
4004d0: 55 push %rbp
4004d1: 48 89 e5 mov %rsp,%rbp
4004d4: 48 b8 48 45 4c 4c 4f mov $0x4f4c4c4548,%rax
4004db: 00 00 00
4004de: 48 89 45 f0 mov %rax,-0x10(%rbp)
4004e2: 66 c7 45 f8 00 00 movw $0x0,-0x8(%rbp)
4004e8: b8 00 00 00 00 mov $0x0,%eax
4004ed: 5d pop %rbp
4004ee: c3 retq
4004ef: 90 nop
0x4f4c4c4548 is the ASCII characters of your string moved in a register and then pushed in the stack.
If I compile the same program with -O3, the output is simply:
00000000004004d0 <main>:
4004d0: 31 c0 xor %eax,%eax
4004d2: c3 retq
4004d3: 90 nop
and the string does not appear in data or rodata, it is simply optimized out.
This is what should happen:
Code 1: nothing is stored anywhere.
Code 2: a is stored on the stack. It is not stored in .data.
Code 3 a is either stored on the stack or in .rodata, depending on whether it is initialized with a constant expression or not. The optimizer might also decide to store it in .text (together with the code).
I do not see any difference in data segment between 3 codes.
That's because there should be no difference. .data is used for non-constant variables with static storage duration that are initialized to a value other than zero.