How do I use GDB Debugger to look at __asm__ content? - c

I'm trying to understand what is happening in this code, specifically within __asm__. How do I step through the assembly code, so I can print each variable and what not?
Specifically, I am trying to step through this to figure out what does the 8() mean and to see how does it know that it is going into array at index 2.
/* a[2] = 99 in assembly */
__asm__("\n\
movl $_a, %eax\n\
movl $99, 8(%eax)\n\
");

The stepi command steps through the assembly one instruction at a time. There is also a nexti for stepping over function calls. These commands do not adhere to the 'type only the unique prefix of a command is enough' rule that works for most commands -- partially because they the next and step commands are entirely prefixes of these commands and partially because these are not used too often and when they are they are typically used by someone who knows that they really want to use them.
info registers displays a lot of the register contents.
You'll also want to view the disassembly with the disassemble command.
More info on all of these commands is available with the help command, for instance:
(gdb) help info registers
tells you that info registers displays the integer registers and their contents, but it also tells you that if you supply a register name it will limit output to that register's value:
(gdb) info registers rax
rax 0x0 0
(rax is the x86_64 version of eax)
The first column is the register name, the second is the hex value, and the third is the integer value.
There is useful help for the disassemble command as well.
Remember that gdb has tab completion for many commands, and this can be used for more than just simple commands, though many times it offers you bad suggestions -- it's sometimes helpful, though.
Including a label within your inline assembly will allow you to easily make a break point at the beginning of it.

I was never any good at AT&T syntax, but I'm pretty sure the 8(%eax) part means "the address 8 bytes after the address stored in EAX", that is, it's the offset relative to the address stored in the register.
Approximate equivalent in Intel syntax would be something like this (off the top of my head, so it's entirely possible that there is some minor mistake here...)
mov eax, a
mov DWORD PTR [eax+8], 99

movl $_a, %eax // load the memory address of a into %eax
movl $99, 8(%eax) // jump 8 bytes and store number 99 (which is a[2])
It seems to me that a is an int array (int has 4 bytes in most platforms). So by increment 4 bytes you'll be accessing the next item of the array. Other examples of assigning values to this array would be:
movl $10, (%eax) // store number 10 on the the first position: a[0]
movl $20, 4(%eax) // jump 4 bytes from the address loaded in %eax
// and store number 20 on the next position (a[1])

Related

Dynamic allocation of structure array inside another structure [duplicate]

I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.
It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.
This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.
As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.
Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.
Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.
There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.
So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.
Here is an experiment: After the initial printf, insert another printf with this code:
printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);
The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.
If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.
Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.

GCC compiler gives different results for Windows and Linux? [duplicate]

I wrote a simple code on a 64 bit machine
int main() {
printf("%d", 2.443);
}
So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.
What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.
It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.
My setup is different from yours,
Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)
with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.
Looking at the generated assembly (-O3, out of habit), the relevant part (main) is
.cfi_startproc
subq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 16
movl $.LC1, %edi # move format string to edi
movl $1, %eax # move 1 to eax, seems to be the number of double arguments
movsd .LC0(%rip), %xmm0 # move the double to the floating point register
call printf
xorl %eax, %eax # clear eax (return 0)
addq $8, %rsp # adjust stack pointer
.cfi_def_cfa_offset 8
ret # return
If instead of the double, I pass an int, not much changes, but that significantly
movl $47, %esi # move int to esi
movl $.LC0, %edi # format string
xorl %eax, %eax # clear eax
call printf
I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.
So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.
Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.
This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.
As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.
Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.
Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.
There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.
So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.
Here is an experiment: After the initial printf, insert another printf with this code:
printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);
The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.
If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.
Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.

Assembly mov instruction output

I want to understand how my data String ends up in rdx. In my mind the mov instruction puts data found at address into the target. So the content from rbp-0x28 is put into rdx. I checked whats in rbp-0x28 and it is not the data string ('AAAAAAA'). If, however, I let the command execute with ni then rdx contains the string. I dont know how the String ends up in rdx as it is not contained in rbp-0x28 beforehand. I know that my data is contained in 0x7fffffffe58f but Im not sure how or when its loaded into rdx. Any help is greatly appreciated!
This depends a lot on which compiler or debugger you're using as well as the architecture and calling convention. I did run your code with Apple's Clang compiler and lldb and got the expected results. There are minior variations between my output and your output but it's relatively easily to follow. Since you only posted partial output of your functions debug at offset+0x12 I'll assume that prior whichever register register held the first argument to the function call (in my case RDI) moved the pointer into [rbp-0x28]
This was my output.
mov rsi, qword ptr[rbp-0x30] is the equivellent of your mov rdx,[rbp-0x28] I think you're under Microsoft's x64 ABI calling convention so your first argument is passed through rcx. But prior to that instruction it's mov [rbp-0x30], rdi which I believe in your case will be mov [rbp-0x28],rcx
In the next instruction mov rdi,rcx I breakpointed again. Here I read the contents rsi which in your case would be rdx. It printed rsi = 0x00007ffeefbff94a
At that specific memory address I got the results 'AAAAAAA' Next I read the register rbp and printed rbp = 0x00007ffeefbff740 Then I read the memory address of 0x0x00007ffeefbff740-0x30 (in your case it would be -0x28) which is 0x0x7ffeefbff710 and here it was the same address stored in rsi
0x7ffeefbff94a (Little endian). Which we know points to the string 'AAAAAAA' So I'm going to assume what you're expecting at RBP-0x28 is the string itself. It should be the address which holds a pointer to the string. Also make sure to do your offsets correctly. Follow these steps:
Breakpoint at lea rax,[rbp-0x20]
Check the value of rdx, view the memory at that address and it should give you the string.
Then check the value of rbp. Subtract 0x28 from it. View the memory at the offset.
This should give you the value of rdx. Which should in turn point to the string you're looking for.

Multiplication of corresponding values in an array

I want to write an x86 program that multiplies corresponding elements of 2 arrays (array1[0]*array2[0] and so on till 5 elements) and stores the results in a third array. I don't even know where to start. Any help is greatly appreciated.
First thing you'll want to get is an assembler, I'm personally a big fan of NASM in my opinion it has a very clean and concise syntax, it's also what I started on so that's what I'll use for this answer.
Other than NASM you have:
GAS
This is the GNU assembler, unlike NASM there are versions for many architectures so the directives and way of working will be about the same other than the instructions if you switch architectures. GAS does however have the unfortunate downside of being somewhat unfriendly for people who want to use the Intel syntax.
FASM
This is the Flat Assembler, it is an assembler written in Assembly. Like NASM it's unfriendly to people who want to use AT&T syntax. It has a few rough edges but some people seem to prefer it for DOS applications (especially because there's a DOS port of it) and bare metal work.
Now you might be reading 'AT&T syntax' and 'Intel syntax' and wondering what's meant by that. These are dialects of x86 assembly, they both assemble to the same machine code but reflect slightly different ways of thinking about each instruction. AT&T syntax tends to be more verbose whereas Intel syntax tends to be more minimal, however certain parts of AT&T syntax have nicer operand orderings tahn Intel syntax, a good demonstration of the difference is the mov instruction:
AT&T syntax:
movl (0x10), %eax
This means get the long value (1 dword, aka 4 bytes) and put it in the register eax. Take note of the fact that:
The mov is suffixed with the operand length.
The memory address is surrounded in parenthesis (you can think of them like a pointer dereference in C)
The register is prefixed with %
The instruction moves the left operand into the right operand
Intel Syntax:
mov eax, [0x10]
Take note of the fact that:
We do not need to suffix the instruction with the operand size, the assembler infers it, there are situations where it can't, in which case we specify the size next to the address.
The register is not prefixed
Square brackets are used to address memory
The second operand is moved into the first operand
I will be using Intel syntax for this answer.
Once you've installed NASM on your machine you'll want a simple build script (when you start writing bigger programs use a Makefile or some other proper build system, but for now this will do):
nasm -f elf arrays.asm
ld -o arrays arrays.o -melf_i386
rm arrays.o
echo
echo " Done building, the file 'arrays' is your executable"
Remember to chmod +x the script or you won't be able to execute it.
Now for the code along with some comments explaining what everything means:
global _start ; The linker will be looking for this entrypoint, so we need to make it public
section .data ; We're going on to describe our data here
array_length equ 5 ; This is effectively a macro and isn't actually being stored in memory
array1 dd 1,4,1,5,9 ; dd means declare dwords
array2 dd 2,6,5,3,5
sys_exit equ 1
section .bss ; Data that isn't initialised with any particular value
array3 resd 5 ; Leave us 5 dword sized spaces
section .text
_start:
xor ecx,ecx ; index = 0 to start
; In a Linux static executable, registers are initialized to 0 so you could leave this out if you're never going to link this as a dynamic executable.
_multiply_loop:
mov eax, [array1+ecx*4] ; move the value at the given memory address into eax
; We calculate the address we need by first taking ecx (which tells us which
; item we want) multiplying it by 4 (i.e: 4 bytes/1 dword) and then adding it
; to our array's start address to determine the address of the given item
imul eax, dword [array2+ecx*4] ; This performs a 32-bit integer multiply
mov dword [array3+ecx*4], eax ; Move our result to array3
inc ecx ; Increment ecx
; While ecx is a general purpose register the convention is to use it for
; counting hence the 'c'
cmp ecx, array_length ; Compare the value in ecx with our array_length
jb _multiply_loop ; Restart the loop unless we've exceeded the array length
; If the loop has concluded the instruction pointer will continue
_exit:
mov eax, sys_exit ; The system call we want
; ebx is already equal to 0, ebx contains the exit status
mov ebp, esp ; Prepare the stack before jumping into the system
sysenter ; Call the Linux kernel and tell it that our program has concluded
If you wanted the full 64-bit result of the 32-bit multiply, use one-operand mul. But normally you only want a result that's the same width as the inputs, in which case imul is most efficient and easiest to use. See links in the x86 tag wiki for docs and tutorials.
You'll notice that this program has no output. I'm not going to cover writing the algorithm to print numbers because we'd be here all day, that's an exercise for the reader (or see this Q&A)
However in the meantime we can run our program in gdbtui and inspect the data, use your build script to build then open your program with the command gdbtui arrays. You'll want to enter these commands:
layout asm
break _exit
run
print (int[5])array3
And GDB will display the results.

ARMv8 illegal instruction [duplicate]

What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax
The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.
Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.

Resources