Why does this assembly code segfaults upon printf [duplicate]

Why does this assembly code segfaults upon printf [duplicate] - c

What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax

The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.

Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.

Related

Dynamic allocation of structure array inside another structure [duplicate]

I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.

It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.

This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.
As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.
Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.
Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.
There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.
So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.
Here is an experiment: After the initial printf, insert another printf with this code:
printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);
The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.
If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.
Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.

How to find all the reachable labels in assembly files?

I'm working on programming a tool which aimed to separate assembly codes into different sections and labels. I'm trying to add a recursive mode.
If i'd like to print codes of one specific label and codes in the label content symbols of other labels, recursive mode should print labels referred to at the same time.
For Example:
.file sample.s
...
A:
...
call B
...
B:
...
C:
...
For codes above, if i'd like to print codes in label A on recursive mode, codes in label A and B should be printed at the same time.
To do this, i have to find all the label reference symbol for each line.
Some of instructions may be important like call, lea, jmp. But it's not easy to list all the conditions.
Any ideas? Thanks for your help!

So you want to print all code reachable from a given label, except by returning further up the call tree? (i.e. all other basic blocks of this function, all child functions, and tail-call siblings).
The normal / simplest way for execution to get from one label to another is so simply fall through. Like
mov ecx, 123
looptop: ; do {
...
dec ecx
jnz looptop ; }while(--ecx)
Unless the last instruction before the next label is an unconditional jump (like jmp or ret, but not call which can eventually return), you should also be following execution into that next block. A ret should end processing, jmp could be followed if you want, jnz might fall through.
For conditional branches, you presumably need to follow both sides.
Trying to trace through indirect jumps after code loads a function-pointer into a register with a RIP-relative LEA or a MOV is probably too hard. Do you really want to be able to trace foo(callback_func, 123) and be able to print the code for foo and the code it might call at callback_func?
If the arg is passed in a register (like x86-64 calling conventions) and it doesn't store it to the stack and reload it, then it's fairly easy to match that up with a jmp rdi after seeing there have been no intervening writes to RDI in between. But if it is more complex, like a debug built storing RDI to the stack and reloading somewhere else, you basically need an x86-64 simulator to trace the values.
I think it might be better to not even attempt tracing through indirect jumps, rather than having something that sometimes works (simple cases), sometimes doesn't. So probably you should forget about lea, unless you're thinking about dumping data declarations for static data referenced with LEA or MOV.
Some int 0x80 or syscall are noreturn (e.g. _exit, or sigreturn), but most aren't. The behaviour depends on the RAX/EAX value (and on the OS). Usually EAX gets set pretty soon before a system call, so you might want to special case the noreturn ones, otherwise you'll fall through past an exit into other code that shouldn't necessarily execute.
Same applies for library function calls like call exit.

GCC compiler gives different results for Windows and Linux? [duplicate]

I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.

It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.

ARMv8 illegal instruction [duplicate]

What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax

The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.

Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.

How many machine instructions are needed for a function call in C?

I'd like to know how many instructions are needed for a function call in a C program compiled with gcc for x86 platforms from start to finish.

Write some code.
Compile it.
Look at the disassembly.
Count the instructions.
The answer will vary as you vary the number and type of parameters, calling conventions etc.

That is a really tricky question that's hard to answer and it may vary.
First of all in the caller it is needed to pass the parameters, depending on the type this will vary, in most cases you will have a push instruction for each parameter.
Then, in the called procedure the first instructions will be to do the allocation for local variables. This is usually done in 3 operations:
PUSH EBP
MOV EBP, ESP
SUB ESP, xxx
You will have the assembly code of the function after that.
Following the code but before the return, the ebp and esp will be restored:
MOV ESP, EBP
POP EBP
Lastly, you will have a ret instruction that depending on the calling convention will dealocate the parameters of the stack or it will leave that to the caller. You can determine this if the RET is with a number as parameter or if the parameter is 0, respectively. In case the parameter is 0 you will have POP instructions in the caller after the CALL instruction.

I would expect at least one
CALL Function
unless it is inlined, of course.

If you use -mno-accumulate-outgoing-args and -Os (or -mpreferred-stack-boundary=2, or 3 on 64-bit), then the overhead is exactly one push per argument word-sized argument, one call, and one add to adjust the stack pointer after return.
Without -mno-accumulate-outgoing-args and with default 16-byte stack alignment, gcc generates code that's roughly the same speed but roughly five times larger for function calls, for no good reason.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight