The GCC compiler on CSLab translates the following C function:
int func(int x) {
return 13 + x;
}
Into the following Assembly code:
func:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
addl $13, %eax
popq %rbp
ret
I have completed this code and was then asked the following question:
In the Assembly code for func shown in the previous question, suppose %rsp has the value
0x7fffffffe3e0
What is the address corresponding to the parameter (local variable) x? Include the 0x prefix.
(Note that the address has 12 significant hex digits, or 6 bytes. > The value for the top two hex digits is 0. Omit the 0s to the left just as shown above.)
I answered 0xd and it was incorrect.
Taking the given value for %rsp as 0x7fffffffe3e0, we have
movq %rsp, %rbp
which copies the value of %rsp to %rbp, then we have
movl -4(%rbp), %eax
addl $13, %eax
We copy something to %eax and add 13 to it, so that something must be x. That something is -4(%rbp), which translates to "an object 4 bytes below the address value stored in %rbp".
Thus, the address of x must be 0x7fffffffe3e0 - 4, or 0x7fffffffe3dc.
Read up on your x86 assembly addressing modes.
Related
This question already has answers here:
Why does the compiler reserve a little stack space but not the whole array size?
(2 answers)
Stack allocation, padding, and alignment
(6 answers)
Closed 2 years ago.
How does gcc decides how much memory allocate for stack and why does it not decrement %rsp anymore when I remove printf() (or any function call) from my main?
1. I noticed when I played around with a code sample: https://godbolt.org/z/fQqkNE that the 6th line in gcc assembly viewer subq $48, %rsp gets removed if I remove printf() from my C code on line 22. It looks like when I don’t make any function calls from within my main, then the %rsp does not get decremented, but data still gets allocated based on %rbp and offsets. I thought %rsp changes only when stack grows. My theory is that since it won’t make any other function calls, it knows that it won’t need to keep stack for other nonexistent functions. But shouldn’t %rsp still grow as data is getting saved?
2. When adding variables to my rect struct, I also noticed that it sometimes allocates memory in steps greater than what the added data type size was. What is the convention it follows when deciding how much memory to allocate to stack?
3. Is there an online tool that would take assembly code as input, and then draw an image of stack and tell me state of every register at any point of execution? Godbolt.org is a very good tool, I just wish it had these 2 extra features.
I'll paste the code below in case the link to godbolt stops working in the future:
#include <stdio.h>
#include <stdint.h>
struct rect {
int a;
int b;
int* c;
int d[2];
uint8_t f;
};
int main() {
int arr[2] = {2, 3};
struct rect Rect;
Rect.a = 10;
Rect.b = 20;
Rect.c = arr;
Rect.d[0] = Rect.a;
Rect.d[1] = Rect.b;
Rect.f =255;
printf("%d and %d", Rect.a, Rect.b);
return 0;
}
.LC0:
.string "%d and %d"
main:
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movl $2, -8(%rbp)
movl $3, -4(%rbp)
movl $10, -48(%rbp)
movl $20, -44(%rbp)
leaq -8(%rbp), %rax
movq %rax, -40(%rbp)
movl -48(%rbp), %eax
movl %eax, -32(%rbp)
movl -44(%rbp), %eax
movl %eax, -28(%rbp)
movb $-1, -24(%rbp)
movl -44(%rbp), %edx
movl -48(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
P.S.: The book I follow uses AT&T syntax for teaching x86. Which is weird because it makes finding online tutorials much harder.
So I do have some assembly code which I wrote on my linux VM (Manjaro, x86_64). It looks like this:
.section .rodata
.LC0:
.string "The value of a is: %d, of b: %d"
.text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $15, -4(%rbp)
movl $20, -8(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
movl %eax, %esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
Basically I want to insert 2 values in registers, then somehow print them (formated like in .LC0). Well, I got stuck, so I just wrote C program, and used gcc -S to see how it looks. It gave me something similair to the code above. I don't understand two things:
If I store 20 in %edx and 15 in %eax, then why passing only %eax to %esi causes printf to print the values both from %eax and %edx?
Why do I have to put a zero constant everytime before and after printf (as gcc does?)
Why do I have to put a zero constant everytime before and after printf
These are two different issues.
Zero before printf conforms to x86-64 a.k.a. AMD64 SysV ABI to specify count of variable arguments in vector (XMMn, YMMn...) registers.
Zero after printf is this function return value (likely, return 0 at its end).
why passing only %eax to %esi causes printf to print the values both from %eax and %edx?
It does not.
The same ABI specifies: the first argument (printf format string pointer) in %rdi; the second argument (first variable argument) in %rsi, and so on. Additional move of arguments seems to be artifact of non-optimized (-O0) gcc output code. If you add any optimization (even -Og), youʼll see these senseless moves wiped out.
What is the difference between these two segments of code:
1
scanf("%d%d", p1, p2);
2
scanf("%d", p1);
scanf("%d", p2);
Since you're not checking the return values, there is no difference in the behavior. If you were checking the return values, the second option potentially gives you a little more detail (two return values to check) for a little more work. If the input is a single number followed by an EOF, the second will return 1,EOF. If the input is a single number followed by a non-number, it will return 1,0. The first option will return 1 in either of the above cases, so you can't tell the difference without another call (if you care).
I am posting this because it will helpful for you to use scanf in different way::
if you input like::
below all input taking in different variable and array
8 5
2 3 1 2 3 2 3 3
0 2
0 1
6 7
3 5
0 7
then how to use scanf
if you want to take input from console as "space separated input" ex:: 1 2 3 4 5 5 6
then you can use scanf() as below
for(i=0; i < no_you_want;i++)
{
// single space before %d in below scanf function
scanf(" %d",&a[i]);
}
Firstly you need to add '&' in your inputs ;-) Like Chris mentioned there is no difference in working unless you handle the return value. I tried to compare the assembly code, of-course there are couple of extra instructions as below.
scanf("%d%d", &p1, &p2);
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $.LC0, %eax
leaq -4(%rbp), %rdx
leaq -8(%rbp), %rcx
movq %rcx, %rsi
movq %rax, %rdi
movl $0, %eax
call __isoc99_scanf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
scanf("%d", &p1);
scanf("%d", &p2);
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $.LC0, %eax
leaq -8(%rbp), %rdx
movq %rdx, %rsi
movq %rax, %rdi
movl $0, %eax
call __isoc99_scanf
movl $.LC0, %eax
leaq -4(%rbp), %rdx
movq %rdx, %rsi
movq %rax, %rdi
movl $0, %eax
call __isoc99_scanf
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
An expected difference between scanf("%d%d", p1, p2); and scanf("%d", p1);
scanf("%d", p2); occurs when the input text is "-+2\n".
With scanf("%d%d", p1, p2);, scanf() read "-+" and seeing that it is not a valid int sequence, ungets the + for subsequent IO. *p1 and *p2 remain unchanged.
With scanf("%d", p1);, scanf() read "-+" and seeing that it is not a valid int sequence, (*p1 remains unchanged) ungets the + for subsequent IO which is scanf("%d", p2); which scans "+3\n", save 3 into *p2 and ungets the \n for subsequent IO.
C11dr §7.21.6.2 9 footnote about fscanf(), which applies to scanf(): "fscanf pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to strtod, strtol, etc., are unacceptable to fscanf.
But I can not make it differ on my machine.
My cygwin Kepler gcc unexpectedly pushes back both "-+" causing both codes to operate the same. Your mileage may vary. My gcc appears to not comply with the C spec.
According to some textbooks, the compiler will use sub* to allocate memory for local variables.
For example, I write a Hello World program:
int main()
{
puts("hello world");
return 0;
}
I guess this will be compiled to some assembly code on the 64 bit OS:
subq $8, %rsp
movq $.LC0, (%rsp)
calq puts
addq $8, %rsp
The subq allocates 8 byte memory (size of a point) for the argument and the addq deallocates it.
But when I input gcc -S hello.c (I use the llvm-gcc on Mac OS X 10.8), I get some assembly code.
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
subq $16, %rsp
Ltmp2:
xorb %al, %al
leaq L_.str(%rip), %rcx
movq %rcx, %rdi
callq _puts
movl $0, -8(%rbp)
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
movl -4(%rbp), %eax
addq $16, %rsp
popq %rbp
ret
.......
L_.str:
.asciz "hello world!"
Around this callq without any addq and subq. Why? And what is the function of addq $16, %rsp?
Thanks for any input.
You don't have any local variables in your main(). All you may have in it is a pseudo-variable for the parameter passed to puts(), the address of the "hello world" string.
According to your last disassembly, the calling conventions appear to be such that the first parameter to puts() is passed in the rdi register and not on the stack, which is why there isn't any stack space allocated for this parameter.
However, since you're compiling your program with optimization disabled, you may encounter some unnecessary stack space allocations and reads and writes to and from that space.
This code illustrates it:
subq $16, %rsp ; allocate some space
...
movl $0, -8(%rbp) ; write to it
movl -8(%rbp), %eax ; read back from it
movl %eax, -4(%rbp) ; write to it
movl -4(%rbp), %eax ; read back from it
addq $16, %rsp
Those four mov instructions are equivalent to just one simple movl $0, %eax, no memory is needed to do that.
If you add an optimization switch like -O2 in your compile command, you'll see more meaningful code in the disassembly.
Also note that some space allocations may be needed solely for the purpose of keeping the stack pointer aligned, which improves performance or avoids issues with misaligned memory accesses (you could get the #AC exception on misaligned accesses if it's enabled).
The above code shows it too. See, those four mov instructions only use 8 bytes of memory, while the add and sub instructions grow and shrink the stack by 16.
I am currently writing a simple C compiler, that takes a .c file as input and generates assembly code (X86, AT&T syntax). I am having a hard time passing array-type parameters and generating the correct assembly code for it. Here's my input:
int getIndexOne(int tab[]){
return tab[1];
}
int main_test(void){
int t[3];
t[0] = 0;
t[1] = 1;
t[2] = 2;
return getIndexOne(t);
}
A fairly simple test. Here is my output:
getIndexOne:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 16
movl %esp, %ebp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movl %edi, -24(%ebp)
movl $1, %eax
movl -24(%ebp, %eax, 8), %eax #trouble over here
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size getIndexOne, .-getIndexOne
falsemain:
.LFB1:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 16
movl %esp, %ebp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
pushl %ebx
subl $120, %esp
movl $2, -32(%ebp)
movl $0, %eax
movl $0, -24(%ebp, %eax, 8)
movl $1, %eax
movl $1, -24(%ebp, %eax, 8)
movl $2, %eax
movl $2, -24(%ebp, %eax, 8)
leal -24(%ebp, %eax, 8), %eax
movl %eax, %edi
call getIndexOne
addl $120, %esp
popl %ebx
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main_test, .-main_test
I'm unable to access the content of the passed address (leal instruction). Any help would be much appreciated.
PS: Don't worry about the size of my ints, they are set to 8 instead of 4 bytes for other reasons.
There are two problems, the first being when you setup the stack for the call to getIndexOne:
movl $2, %eax
movl $2, -24(%ebp, %eax, 8)
leal -24(%ebp, %eax, 8), %eax ##<== EAX still holds value of 2!
movl %eax, %edi ##<== wrong address for start of array
You're not clearing the contents of the EAX register after the MOV command, therefore the address that you're putting into the EDI register for the function call is not pointing to the start of the array, but last element of the array (i.e., the second index which has an element value of 2).
The second problem comes here in your getIndexOne function:
movl %edi, -24(%ebp)
movl $1, %eax
movl -24(%ebp, %eax, 8), %eax
You've stored the address on the stack. That's fine, but it also means when you retreive the value from the stack, that you're getting a pointer back which must then be dereferenced a second time. What you're doing right now is you're just reading back a value offset from the frame-pointer on the stack ... that's not the value in the array since you're not dereferencing the pointer that you stored on the stack. In other words you should change it to the following if you must store the pointer on the stack (I think this isn't the most efficient way since the value is already in EDI, but whatever):
movl %edi, -24(%ebp) ##<== store pointer to the array on the stack
movl $1, %eax
movl -24(%ebp), %ecx ##<== get the pointer back from the stack
movl (%ecx, %eax, 8), %eax ##<== dereference the pointer
As a side note, while I'm not sure how your compiler works, I do think it's a little scary that you are using the value that you are loading into the array elements to also index into the array ... if the values being loaded don't match the array index, that's going to create quite a bit of havok ... I'm guessing this is some type of optimization you're attempting to-do when the two values match?