In a piece of code viewed in gdb such as the following:
0x8e4e <boot1main+1>: mov %esp,%ebp
0x8e50 <boot1main+3>: push %esi
0x8e51 <boot1main+4>: mov 0xc(%ebp),%esi
0x8e54 <boot1main+7>: push %ebx
0x8e55 <boot1main+8>: mov 0x10(%ebp),%ebx
0x8e58 <boot1main+11>: sub $0xc,%esp
0x8e5b <boot1main+14>: push $0x3
0x8e5d <boot1main+16>: call 0x8bb6 <roll>
0x8e62 <boot1main+21>: movl $0x8f84,(%esp)
0x8e69 <boot1main+28>: call 0x8b77 <putline>
What is the precise meaning of the addresses on the left? Are these where the assembly instructions are located on the machine? (i.e. 0x8e4e, 0x8e50).
The above piece of code was generated without having set up virtual memory yet. In fact, it represents some boot-loader code I am tracing through.
For a normal C program, are those addresses on the left virtual addresses?
Yes, that's the address of the code you're looking at. And yes, for a normal C program on a "proper" OS, the addresses would be virtual.
Related
I am looking at some old code from a school project, and in trying to compile it on my laptop I ran into some problems. It was originally written for an old 32 bit version of gcc. Anyway I was trying to convert some of the assembly over to 64 bit compatible code and hit a few snags.
Here is the original code:
pusha
pushl %ds
pushl %es
pushl %fs
pushl %gs
pushl %ss
pusha is not valid in 64 bit mode. So what would be the proper way to do this in x86_64 assembly while in 64 bit mode?
There has got to be a reason why pusha is not valid in 64 bit mode, so I have a feeling manually pushing all the registers may not be a good idea.
AMD needed some room to add new opcodes for REX prefixes and some other new instructions when they developed the 64-bit x86 extensions. They changed the meaning of some of the opcodes to those new instructions.
Several of the instructions were simply short-forms of existing instructions or were otherwise not necessary. PUSHA was one of the victims. It's not clear why they banned PUSHA though, it doesn't seem to overlap any new instruction opcodes. Perhaps they are reserved the PUSHA and POPA opcodes for future use, since they are completely redundant and won't be any faster and won't occur frequently enough in code to matter.
The order of PUSHA was the order of the instruction encoding: eax, ecx, edx, ebx, esp, ebp, esi, edi. Note that it redundantly pushed esp! You need to know esp to find the data it pushed!
If you are converting code from 64-bit the PUSHA code is no good anyway, you need to update it to push the new registers r8 thru r15. You also need to save and restore a much larger SSE state, xmm8 thru xmm15. Assuming you are going to clobber them.
If the interrupt handler code is simply a stub that forwards to C code, you don't need to save all of the registers. You can assume that the C compiler will generate code that will be preserving rbx, rbp, rsi, rdi, and r12 thru r15. You should only need to save and restore rax, rcx, rdx, and r8 thru r11. (Note: on Linux or other System V ABI platforms, the compiler will be preserving rbx, rbp, r12-r15, you can expect rsi and rdi clobbered).
The segment registers hold no value in long mode (if the interrupted thread is running in 32-bit compatibility mode you must preserve the segment registers, thanks ughoavgfhw). Actually, they got rid of most of the segmentation in long mode, but FS is still reserved for operating systems to use as a base address for thread local data. The register value itself doesn't matter, the base of FS and GS are set through MSRs 0xC0000100 and 0xC0000101. Assuming you won't be using FS you don't need to worry about it, just remember that any thread local data accessed by the C code could be using any random thread's TLS. Be careful of that because C runtime libraries use TLS for some functionality (example: strtok typically uses TLS).
Loading a value into FS or GS (even in user mode) will overwrite the FSBASE or GSBASE MSR. Since some operating systems use GS as "processor local" storage (they need a way to have a pointer to a structure for each CPU), they need to keep it somewhere that won't get clobbered by loading GS in user mode. To solve this problem, there are two MSRs reserved for the GSBASE register: one active one and one hidden one. In kernel mode, the kernel's GSBASE is held in the usual GSBASE MSR and the user mode base is in the other (hidden) GSBASE MSR. When context switching from kernel mode to a user mode context, and when saving a user mode context and entering kernel mode, the context switch code must execute the SWAPGS instruction, which swaps the values of the visible and hidden GSBASE MSR. Since the kernel's GSBASE is safely hidden in the other MSR in user mode, the user mode code can't clobber the kernel's GSBASE by loading a value into GS. When the CPU reenters kernel mode, the context save code will execute SWAPGS and restore the kernel's GSBASE.
Learn from existing code that does this kind of thing. For example:
Linux (search for SAVE_ARGS_IRQ): entry_64.S
OpenSolaris (search for INTR_PUSH): privregs.h
FreeBSD (search for IDT_VEC): exception.S (similar is vector.S in NetBSD)
In fact, "manually pushing" the regs is the only way on AMD64 since PUSHA doesn't exist there. AMD64 isn't unique in this aspect - most non-x86 CPUs do require register-by-register saves/restores as well at some point.
But if you inspect the referenced sourcecode closely you'll find that not all interrupt handlers require to save/restore the entire register set, so there is room for optimizations.
pusha is not valid in 64-bit mode because it is redundant. Pushing each register individually is exactly the thing to do.
Hi it might not be the correct way to do it but one can create macros like
.macro pushaq
push %rax
push %rcx
push %rdx
push %rbx
push %rbp
push %rsi
push %rdi
.endm # pushaq
and
.macro popaq
pop %rdi
pop %rsi
pop %rbp
pop %rbx
pop %rdx
pop %rcx
pop %rax
.endm # popaq
and eventually add the other r8-15 registers if one needs to
I found x86 lea instructions in an executable file made using clang and gcc.
The lea instructions are after the ret instruction as shown below.
0x???????? <func>
...
pop %ebx
pop %ebp
ret
lea 0x0(%esi,%eiz,1),%esi
lea 0x0(%edi,%eiz,1),%edi
0x???????? <next_func>
...
What are these lea instructions used for? There is no jmp instruction to the lea instructions.
My environment is Ubuntu 12.04 32-bit and gcc 4.6.3.
It's probably not anything--it's just padding to let the next function start at an address that's probably a multiple of at least 8 (and quite possibly 16).
Depending on the rest of the code, it's possible that it's actually a table. Some implementations of a switch statement, for example, use a constant table that's often stored in the code segment (even though, strictly speaking, it's more like data than code).
The first is a lot more likely though. As an aside, such space is often filled with 0x03 instead. This is a single-byte debug-break instruction, so if some undefined behavior results in attempting to execute that code, it immediately stops execution and breaks to the debugger (if available).
I have some C code that I compiled with gcc:
int main() {
int x = 1;
printf("%d\n",x);
return 0;
}
I've run it through gdb 7.9.1 and come up with this assembler code for main:
0x0000000100000f40 <+0>: push %rbp # save original frame pointer
0x0000000100000f41 <+1>: mov %rsp,%rbp # stack pointer is new frame pointer
0x0000000100000f44 <+4>: sub $0x10,%rsp # make room for vars
0x0000000100000f48 <+8>: lea 0x47(%rip),%rdi # 0x100000f96
0x0000000100000f4f <+15>: movl $0x0,-0x4(%rbp) # put 0 on the stack
0x0000000100000f56 <+22>: movl $0x1,-0x8(%rbp) # put 1 on the stack
0x0000000100000f5d <+29>: mov -0x8(%rbp),%esi
0x0000000100000f60 <+32>: mov $0x0,%al
0x0000000100000f62 <+34>: callq 0x100000f74
0x0000000100000f67 <+39>: xor %esi,%esi # set %esi to 0
0x0000000100000f69 <+41>: mov %eax,-0xc(%rbp)
0x0000000100000f6c <+44>: mov %esi,%eax
0x0000000100000f6e <+46>: add $0x10,%rsp # move stack pointer to original location
0x0000000100000f72 <+50>: pop %rbp # reclaim original frame pointer
0x0000000100000f73 <+51>: retq
As I understand it, push %rbb pushes the frame pointer onto the stack, so we can retrieve it later with pop %rbp. Then, sub $0x10,%rsp clears 10 bytes of room on the stack so we can put stuff on it.
Later interactions with the stack move variables directly into the stack via memory addressing, rather than pushing them onto the stack:
movl $0x0, -0x4(%rbp)
movl $0x1, -0x8(%rbp)
Why does the compiler use movl rather than push to get this information onto the stack?
Does referencing the register after the memory address also put that value into that register?
It is very common for modern compilers to move the stack pointer once at the beginning of a function, and move it back at the end. This allows for more efficient indexing because it can treat the memory space as a memory mapped region rather than a simple stack. For example, values which are suddenly found to be of no use (perhaps due to an optimized shortcutted operator) can be ignored, rather than forcing one to pop them off the stack.
Perhaps in simpler days, there was a performance reason to use push. With modern processors, there is no advantage, so there's no reason to make special cases in the compiler to use push/pop when possible. It's not like compiler-written assembly code is readable!
While Cort is correct, there is another important reason for this practice of apparently allocating space on the stack. According to the ABI, function calls must find the stack 16 byte aligned. Rather than fiddling with the stack every single time a call needs to be made from a function, it is generally easier and more efficient to adjust the stack for proper alignment first and then modify the values that might otherwise have been pushed onto it.
So, the stack is absolutely adjusted for local variable space, but it is also adjusted to provide correct stack alignment for calls into the standard library.
I'm not an authority on assemblers or compilers but I've played around with MASM back in the day and did spend a whole bunch of time with WinDbg while debugging production C++ issues.
I think the answer to your question is because it's easier.
push/pop instructions write to and read from the stack but they also modify the stack as they are processed. C/C++ compiler uses stack for all its local variables. It does it by shifting stack pointer by the exact number of bytes that is needed to hold all local variables and it does so right when you enter the function.
After that reading and writing all those variables can be done from anywhere in the function and also as many times as you want by simply using the mov instructions. If you look at pure assembly, you might question why create a hole in the stack just to copy two values into that space using mov when you could have done two push instructions.
But look at it from compiler author perspective. The process of entering a function, and allocating stack for local variables is coded separately and is completely decoupled from the process of reading/writing those variables.
I'm trying to understand some assembly code with AT&T syntax.
Here is a snippet:
"mov %eax, %ebx; "\
"mov %eax, %ecx;"\
"fxch %st(1);"\
This is what I understood from it.
the mov copies (Am I correct?, or does it move?) the data from the source register to the destination register
In line one: we copy the data from registry eax to ebx.
Similarly, we copy the data from registry eax to ecx.
However, what I failed to understand is the following.
How does fxch work? Here is a link that gives an example.
fxch st(2)
fsqrt
fxch st(2)
It says that this above code takes the sqrt of st(2).
Correct me if I am wrong.
It swaps the top of the stack with st(2) and then takes the sqrt of what?
I don't understand that clearly.
Can you please help me out? How does that work in my case and in the above case?
mov instructions indeed copy a value and fsqrt takes the square root of the top of the stack and replaces the top of the stack with its result. So the given code sequence effectively takes the square root of st(2) and puts it back at the same place.
In answer to your question below. The two mov instructions copy the value in register %eax to %ebx and %ecx. So if you add another mov %eax,%edx, then this value (from %eax) is also copied to %edx.
Note that this holds for AT&T assembly. In Intel assembly the values are copied the other way around. In that case %eax was, quite uselessly, changed repeatedly to contain the value of the other registers.
The fxch st(1) exchanges the top of the stack, which is st(0) with the element just below the top st(1). Similarly st(2) is just below st(1). Contrary to the integer registers, the floating point registers on the x86 are organized in a stack, reducing the instruction length of operations on those floating point registers as they always work on the top element(s) of the stack. This comes with the overhead of having to use fxch instructions to put the right values on the top of the stack.
The integer registers %eax, %ebx etc. are distinct from the floating point stack/registers st(0), st(1) etc. So the mov instructions are not related to the fxch instructions. The order of these instructions could be changed without effecting the result.
For this simple code:
void main(){
char buf[12];
buf[11]='\xff';
puts(buf);
}
I use gdb to debug this code and get its stack info like this:
0xbffff480: 0x40158ff4 0x40158ff4 0xff0494dc 0x40158ff4
0xbffff490: 0x00000000 0x40016ca0 0xbffff4f8 0x40045de3
0xbffff480 is where "buf" starts, and the last two words are EBP and RET, but what the hell is that between buf and EBP? obviously I don't have any other local vars. And I also tried if I allocate 8 bytes for buf on stack, it just continues from EBP, but if I allocate 9 or more bytes, it always has something in between. Could somebody explain this to me please? Thanks a lot! I am on linux 2.6.9
disassembly for main:
0x080483c4 <main+0>: push %ebp
0x080483c5 <main+1>: mov %esp,%ebp
0x080483c7 <main+3>: sub $0x28,%esp
0x080483ca <main+6>: and $0xfffffff0,%esp
0x080483cd <main+9>: mov $0x0,%eax
0x080483d2 <main+14>: add $0xf,%eax
0x080483d5 <main+17>: add $0xf,%eax
0x080483d8 <main+20>: shr $0x4,%eax
0x080483db <main+23>: shl $0x4,%eax
0x080483de <main+26>: sub %eax,%esp
0x080483e0 <main+28>: movb $0xff,0xfffffff3(%ebp)
0x080483e4 <main+32>: lea 0xffffffe8(%ebp),%eax
0x080483e7 <main+35>: mov %eax,(%esp)
0x080483ea <main+38>: call 0x80482e4
0x080483ef <main+43>: leave
0x080483f0 <main+44>: ret
That would be padding. Your compiler is probably aligning EBP on a 8-byte boundary, because aligned memory is almost always easier and faster to work with (from the processor's point of view). Some data types even require proper alignment to work.
You don't see any padding when you allocate only 8 bytes in your buffer because, in that case, EBP is already properly aligned.
Normally gcc keep the stack aligned to a multiple of 16 for the sake of being able to use SSE instructions. Reading a disassembly of your main would be instructive.
GCC is known to be somewhat trigger happy with stack usage. It often reserves a bit more stack space than strictly needed. Also, it will try to achieve 16-byte stack alignment even when this is not necessary. This seems to be what happens with the first instructions (40 bytes reserved, then %esp alignment to a multiple of 16).
The code you show, however, contains come strange things, especially the sequence from offsets 9 to 27: this is a long, slow, convoluted way of subtracting 16 from %esp, something which could have been done in a single opcode. Subtracting some bytes from %esp at that point is logical in preparation for calling an external function (puts()), and the count (16) respects alignment, but why doing so in such a weird way ?
It might be possible that this sequence is meant to be patched up in some way (e.g. at link time) to support either stack smashing detection code, or some sort of profiling code. I cannot reproduce this on my own systems. You should specify the version of gcc and libc you are using, the exact compilation flags, and the Linux distribution (because distributors may activate some options by default). The "2.6.9" figure is the kernel version, and it has no bearing whatsoever on the problem at hand (it just tells us that the system is quite old).
You should really include an aseembly dump rather than a pure hex dump. but its more than likely one of 2 things:
A stack frame being restored
A stack check to ensure there was no corupption
the start may also contain a stack alignment, forcing one or both of the above