So I'm new to Linux and just got Ubuntu 16.04.2 running on a VM. I've installed gcc/g++ on here in the terminal, but when I run my program in GDB, as soon as I get to a strcmp function, this pops up for many lines.
strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:24
24 ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: No such file or directory.
And when I go further down:
strlen () at ../sysdeps/x86_64/strlen.S:66
66 ../sysdeps/x86_64/strlen.S: No such file or directory.
So I'm guessing it just doesn't recognize my C library..
I realize I can step through this after a couple of tries, but this comes up for all my c functions and when I use GDB on my school server, I don't run into this issue. Any help would be appreciated.
I get to a strcmp function, this pops up for many lines.
When you does s (single step) or si (Step single instruction), what you see for string and memory functions like strcmp, memcpy, memcmp, strlen etc is correct, and GDB does recognize your C library (Ubuntu 16.04.2 amd64 started from iso in VM already has libc6-dbg debugging package preinstalled for your libc - C library).
strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:24
24 ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: No such file or directory.
strlen () at ../sysdeps/x86_64/strlen.S:66
66 ../sysdeps/x86_64/strlen.S: No such file or directory.
What we see here is that GDB was able to find debugging information for both functions strcmp and strlen to get line numbers, but these functions of standard C library are not C functions! They are assembler functions (one is optimiezed with SSE2), we can see this from .S suffix of their source reference. You can try to do several s or si after entering to them to see incrementing source file lines.
it just doesn't recognize
GDB did all what it can do: it finds debugging info for your system C library (it is not easiest as debug info is separated to other file somewhere in /usr/lib/debug/lib/x86_64-linux-gnu/ with other name), and finds which instruction comes from which line of source. What it can't do is to open source file, as it is not part of preinstalled ubuntu image not part of any ubuntu (debian) binary package.
What can you do if you want to look inside this system library function:
1) Check disassembly of the function with GDB command disassemble (by default it will print current function). It will be very close to the source of this function implementation as it was originally written in assembler and what you lose are comments and structure of macro:
Dump of assembler code for function strlen:
0x000address70 <+0>: pxor %xmm0, %xmm0
=> 0x000address74 <+0>: pxor %xmm1, %xmm1
0x000address78 <+0>: pxor %xmm2, %xmm2
0x000address7c <+0>: pxor %xmm3, %xmm3
...
2) Or you can see instructions as they are executed with "display" command like display/i $pc or disp/2i $pc (print one instruction at current PC which is universal just name of EIP or RIP; or print two instructions: current and next)
3) Or you can create the path required by gdb and copy original source to it: mkdir -p ../sysdeps/x86_64/ and save to this directory assembler source for your version of library. There is glibc-2.23 version for strlen.S (github mirror of authors GIT): https://github.com/bminor/glibc/blob/glibc-2.23/sysdeps/x86_64/strlen.S#L66
4) Or you can download ubuntu source for libc with apt source libc (in some stable path like ~/src after mkdir ~/src) and point gdb to this directory (adding some real subdirectory accounting to ../ relative part of libc build in ubuntu) with directory ~/src/glibc-2.23/sysdeps)
this comes up for all my c functions
No, for your c functions you have other kind of output (not ... something.S: No such file or directory). And you should enable debugging symbols when you built your program by adding -g argument to gcc (or other compiler).
Related
I am trying to open a C executable in lldb to make myself more comfortable with assembly.
My problem is that when i open the executable with GDB, disassembly output looks like this:
call 0x1030 <puts#plt>
And that is ok.
But when i use LLDB, output looks like this:
call 0x1030 ; stub for __lldb_unnamed_symbol67
Why does LLDB not detect the symbol name for a function stub on the PLT, but GDB does?
I'm compiling C code with GCC and assembling some x86 code with NASM on Windows.
Now, GCC by default (and I haven't been able to find an option to change this) prepends an underscore _ to all external symbol names (and expected names).
I need this assembly code to work with GCC on both Windows and Linux and would like to avoid hacks as much as possible (and code duplication; I had separate .s files for Windows/Linux at first).
I found out about (and used) the --prefix flag in NASM. Now for some symbols I'd like NASM to treat them as without the leading underscore (exact situation right now is that I need to reference the entry point in a linker script without the leading underscore). Hence the question here on how to override, per symbol, the --prefix/--postfix flags of NASM.
Feel free to treat this as an XY problem. If there's a way to set the mangling scheme of GCC for C that'd be great, for example.
I stumbled upon the same problem. I've created an include file with a lot of defines like
%define printf _printf
%define puts _puts
%define scanf _scanf
and some other stuff.
That file (libc_win32.in) is included by a "master" include file (libc.inc):
%ifndef LIBC_INC
%define LIBC_INC
%ifdef win32
%include 'libc_win32.inc'
%elifdef win64
%include 'libc_win64.inc'
%elifdef elf32
%include 'libc_elf32.inc'
%elifdef elf64
%include 'libc_elf64.inc'
%else
; %error "libc.inc"
%endif
%endif
I set the symbols and include the files at the command line:
nasm -fwin32 -dwin32 -plibc.inc ...
or
nasm -felf32 -delf32 -plibc.inc ...
There is a predefined macro called __OUTPUT_FORMAT__, but it works only inside of a macro, not at program start.
It is about reverse engineering in linux: if I have a .c file and I compile it with gdb all it's fine. But how can I obtain the same result starting from an executable file?
I tried objdump -M intel -D file to disassemble but then I would like to assemble it again in order to open it with gdb (instead if I directly open the executable with gdb I can't do things like putting breakpoints and viewing registers); I tried with nasm and gcc but they found errors in the syntax.
If the symbol table has been stripped off, you cannot get it back.
Anyway, you can set breakpoints in GDB on a specific code address with:
break *address
If you have a hex address, you must precede it with 0x e.g.:
break *0x400506
And to print the current register values, you can use info registers as also answered in How to print register values in gdb?
info registers
NASM and the GNU assembler use different syntax, that why you cannot easily dissamble with the first and assemble with the latter. NASM uses a variant of the Intel syntax. The GNU assembler prefers AT&T syntax.
I am currently compiling a bought data stack in C. I use their own tool to compile it, using in the background gcc. I can pass flags and parameters to gcc as I see fit. I want to know, from which file is the main() used. That is, in the project, which file is the starting point. Is there any way to tell gcc to generate a list of files, or similar, given that I dont know from which file is main() being taken? Thank you.
You can disassemble the final executable to find the starting point. Although you have not provided any additional info to help you more. I'm using a sample code to demonstrate the process.
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
Now the object main.o has the following this
[root#s1 sf]# gcc -c main.c
[root#s1 sf]# nm main.o
0000000000000000 T main
U puts
You can see main is not initialized. Because it will changed in linking stage. Now after linking :
$gcc main.o
$nm a.out
U __libc_start_main##GLIBC_2.2.5
0000000000600874 A _edata
0000000000600888 A _end
00000000004005b8 T _fini
0000000000400390 T _init
00000000004003e0 T _start
000000000040040c t call_gmon_start
0000000000600878 b completed.6347
0000000000600870 W data_start
0000000000600880 b dtor_idx.6349
00000000004004a0 t frame_dummy
00000000004004c4 T main
You see that main has a address now. But its still not final. Because this main will called by C runtime dynamically. you can see who will do the part of U __libc_start_main##GLIBC_2.2.5:
[root#s1 sf]# ldd a.out
linux-vdso.so.1 => (0x00007fff61de1000) /* the linux system call interface */
libc.so.6 => /lib64/libc.so.6 (0x0000003c96000000) /* libc runime , this will invoke your main*/
/lib64/ld-linux-x86-64.so.2 (0x0000003c95c00000) /* dynamic loader */
Now you can verify this by viewing the disassembly :
00000000004003e0 <_start>:
..........
4003fd: 48 c7 c7 c4 04 40 00 mov rdi,0x4004c4 /* address of start of main */
400404: e8 bf ff ff ff call 4003c8 <__libc_start_main#plt> /* this will set up the environment for main, like pushing argc and argv to stack */
...........
If you don't have the source with you, then you can search in the executable for references to libc_start_main or main or start to see how your executable is initialized and starts the main.
Now all of these is done when linking is done with default linker script. Many big project will use its own linker script. If your project has custom linker script, then finding the start point will be different depending on the linker script used. There are projects which does not uses glibc's runtime. In that case, its still possible to find the start point by hacking the object files, library archives etc.
If your binary is stripped from symbols, then you have to actually rely on your assembler skill to find where it starts.
I've assumed that you don't have the source, that is the stack is distributed with some libraries and some header definitions only.(A common practice of commercial software vendors).
But if you have source with you, then its just too trivial. just grep your way through it. Some answers already pointed that out.
From where main() is called is implementation-dependent -- using GCC, it will most likely be a stub object file in /usr/lib called crt0.o or crt1.o from which it is called. (this file contains the OS-dependent symbol which is automatically invoked by the kernel when your app is loaded into memory. On Linux and Mac OS X, this is called start).
You can use objdump -t to list symbols from object files. So assuming you are on Linux, and also assuming that the object files are still around somewhere, you can do this:
find -name '*.o' -print0 \
| xargs -0 objdump -t \
| awk '/\.o:/{f=$1} /\.text\.main/{print f, $6}'
This will print a list of object files and the references to main they contain. Usually there should be a simple map from object files to source files. If there are multiple object files containing that symbol, then it depends on which one of those actually got linked into the binary you're looking at, as there can be no more than one main per executable binary (except perhaps for some really exotic black magic).
After the application is linked and debugging symbols are stripped, there usually is no indication from which source file a specific function came. The exception to this are files which include the function names as string literals, e.g. using the __FILE__ macro. Before stripping debugging symbols, you might use the debugger to obtain that information. If debugging symbols are included, that is.
As the title suggests, is there any way to read the machine code instructions as/after they have been executed? For example, if I had an arbitrary block of C code and I wanted to know what instructions were compiled and executed when that block was entered then would there be a way to do that? Thank you in advance for any pointers on the subject.
Edit: Some motivation as to what I'm trying to do: I want to have a program that roughly figures out how it has been compiled or what instructions it is currently running without actually needing to know how the machine code is made. I.e. I want to use the hard work that some compiler previously did in compiling a program so that I can copy and later use the machine code being executed.
Little-known fact: GDB has a curses interface built in.
Use gdbtui or gdb,Ctrl+X,Ctrl+A to enter, then Ctrl+X,2 to start showing assembly and source together. The current instruction is highlighted, and you can navigate using the arrow keys.
Almost every debugger can do this.
For gdb, a useful trick to remember is: display/i $pc
Do that once, and then set a breakpoint on a function, the step through the function with stepi and nexti.
The instruction at the PC will be automatically displayed each time.
Ross-Harveys-MacBook-Pro:so ross$ cat > deb.c
int main(void) { return (long)main + 0x123; }
Ross-Harveys-MacBook-Pro:so ross$ cc -O deb.c
Ross-Harveys-MacBook-Pro:so ross$ gdb -q a.out
Reading symbols for shared libraries .. done
(gdb) break main
Breakpoint 1 at 0x100000f30
(gdb) display/i $pc
(gdb) r
Starting program: /Users/ross/so/a.out
Reading symbols for shared libraries +. done
Breakpoint 1, 0x0000000100000f30 in main ()
1: x/i $pc 0x100000f30 <main+4>: lea -0xb(%rip),%rax # 0x100000f2c <main>
(gdb) stepi
0x0000000100000f37 in main ()
1: x/i $pc 0x100000f37 <main+11>: add $0x123,%eax
(gdb) stepi
0x0000000100000f3c in main ()
1: x/i $pc 0x100000f3c <main+16>: leaveq
I can't tell if you're asking about doing this at runtime, or if you want to see a textfile containing the assembly code of your compiled C code.
If the former, just use a debugger (use disassemble in gdb with gcc, or the integrated debugger in Microsoft Visual Studio).
If the latter, you'll have to look up the specific commands for your compiler. With Visual Studio, for example, just use the flag /FAs; this will output the assembly code with your source code. For gcc:
gcc -c -g -Wa,-a,-ad foo.c > foo.lst
Most debuggers have options to view the disassembly of the code you are executing.
Ex: in gdb use the disassemble command.
If you want to know what the execution path was for a particular function, perhaps some processors have such a feature, but generally no. Now what you can do is run in an emulator and modify the emulator to print out the addresses of the fetches or reads or whatever.
If this is just a disassembly question using the gcc/binutils tools objdump -D filename > out.list and not bother executing or using a debugger