It is about reverse engineering in linux: if I have a .c file and I compile it with gdb all it's fine. But how can I obtain the same result starting from an executable file?
I tried objdump -M intel -D file to disassemble but then I would like to assemble it again in order to open it with gdb (instead if I directly open the executable with gdb I can't do things like putting breakpoints and viewing registers); I tried with nasm and gcc but they found errors in the syntax.
If the symbol table has been stripped off, you cannot get it back.
Anyway, you can set breakpoints in GDB on a specific code address with:
break *address
If you have a hex address, you must precede it with 0x e.g.:
break *0x400506
And to print the current register values, you can use info registers as also answered in How to print register values in gdb?
info registers
NASM and the GNU assembler use different syntax, that why you cannot easily dissamble with the first and assemble with the latter. NASM uses a variant of the Intel syntax. The GNU assembler prefers AT&T syntax.
Related
I am looking for help with GDB to reverse engineer shared library written in C that is preloaded in /etc/ld.so.preload.
Current library hooks accept() call if source port is correct it returns reverse shell back to user.
Strings command doesn't give out source port, so my target is to try to find it within GDB.
Program consist of two files headers.h where I have my definitions and variable #define SECRET_PORT 11111
source.c contains accept hook with reverse shell.
My problem is I cannot figure out a way how to retrieve PORT within GDB - I can load mylib.so within gdb and run: info functions to see whats inside - I can see accept function but when I try to disass accept I only get instructions that I barely can understand.
Problem when I run mylib it gives out SIGSEGV (maybe thats the reason I cannot see variables) there is no main function where to set break and if I do set it on function accept is still gives SIGSEGV error.
I tested with starti instead of run then I got Program stopped 0xSOMEADRESGOESHERE in deregister_tm_clones() I don't even know if this is correct way to test .so file. maybe there are some oser switches.
Im thinking I need to find a way how to set BP in HTONS() checking function where if statement compares source port and extract values from there but so far no luck.
p.s. when mylib is loaded in gdb there is message No debugging symbols found. So I cannot run like list accept or anything like that to view a source.
Compilation code gcc -Wall -shared -fPIC mylib.c -o mylib.so -ldl
Im thinking I need to find a way how to set BP in HTONS() checking function where if statement compares source port and extract values from there
You don't need to do that -- the instructions will be the same whether you run the application, or disassemble the function without running.
Compilation code ...
So you are trying to reverse-engineer the library for which you have a source?
That makes it very easy to find the constant you are looking for.
Start by setting the constant to easily recognizable value, e.g. 0x12131415. Compile the library and disassemble it. Look for your constant.
If you don't see it, save the disassembled output, and rebuild the library with a different value, e.g. 0xA1B1C1D1. Disassemble it again and compare to previous disassembled output. It should be easy to spot the difference.
P.S. If you really want to debug this library with a live process, do this:
gdb ./myprog
(gdb) set env LD_PRELOAD /path/to/mylib.so
(gdb) run
At this point, you should be able to set breakpoints and observe your library "in action".
Ok managed solve this with a help
when running GDB on shared library You will have to check hex value for 11111 and it should be 2B67 so in registers this will become something like 0x2b67 & it will be passed to htons() as check for source port.
So let's assume I didn't have the source code I could still run: gdb -q *.so
then: info functions and see with disass functionNameGoesHere where some accept / htons calls are made. Correct value should be found right above htons line.
Then decoded hex to dec and thats how You can find it.
This took some while to figure out as I coudn't set BP's.
Again thanks for input from community! Cheers
I am trying to print all the Undefined function calls from a shared object file along with file name.
I tried with "nm" command, It print all the undefined function calls .But could not get the file name.
Example:
bash$ nm -u my_test.so
:
U _ZNSs4_Rep20_S_empty_rep_storageE##GLIBCXX_3.4
:
Environment : Ubuntu 18.04 , X86 Arch (Intel processor)
Study in details the specification of the DWARF format (which is the format used by debugging information on Linux). So you could extract the information (but it is not exactly simple) by parsing the DWARF inside your ELF binary.
Consider looking inside the source code of Ian Taylor's libbacktrace. It is doing this extraction of file name from DWARF inside ELF.
Perhaps your real problem is getting precise backtrace information, and then that libbacktrace is exactly what you need!
You might also use gdb : it is extensible and scriptable in Python (or Guile) and you could write your own specialized script.
Perhaps you'll better solve your real problem with some GCC plugin working when you compile your code.
Read How to write shared libraries by Drepper and read more about ELF.
You could for example collect all the undefined symbols in your shared library using nm (or readelf). Then a second script will find the occurrences of these in your source code. It could be even a simple awk script (or some for shell loop using grep), or something as sophisticated as a GCC plugin.
Your example shows (probably) a mangled C++ name. You could use nm -C to get it unmangled. And later write a GCC plugin to find all the GIMPLE CALL instructions using it.
Writing a GCC plugin may take some time, in particular if you are not familiar with GCC internals.
i'm trying to link a simple c program on an arm debian machine (a raspberry pi) and when linking the ogject file the linker returns me the error in the subject.
my program is as simple as
simple.c:
int main(){
int a = 2;
int b = 3;
int c = a+b;
}
i compile it with
$>gcc -o simple.obj simple.c
and then link it with
$>ld -o simple.elf simple.obj
ld: simple.obj: access beyond end of merged section (33872)
i can't understand why...
if i try to read the elf file with objdump -d it doesn't manage to decompile the .text section (it only prints address, value, .word and again value preceded by 0x) but the binary data is the same as the one i get from the decompiled simple.obj.
the only difference is in the loading start (and consequent) addresses of the binary data: the elf file starts at 0x8280, the object file starts at 0x82a0.
what does all this mean?
EDIT:
this is the dump for the obj file: http://pastebin.com/YZ94kRk4
and this is the dump for the elf file: http://pastebin.com/3C3sWqrC
i tried compiling with -c option that makes gcc stop after assembly time (it already did the linking part) but now i have a different problem: it says that there is no _start section in my object file...
the new dumps are:
simple.obj: http://pastebin.com/t0TqmgPa
simple.elf: http://pastebin.com/qD35cnqw
You are misunderstanding the effect of the commands you ran. If you run:
$ gcc -o simple.obj simple.c
it already creates the program you want to run, it's already linked. You don't need to link it again, especially by running ld directly unless you know what you are doing. Even if its extension is obj, it doesn't matter, it's just the name of the file, but the content of the file is already a complete Linux program. So if you run:
$ ./simple.obj
it will execute your code.
You usually don't call ld directly, but instead you use gcc as a front-end to compile and link. This is because gcc takes care of linking also important libraries that you are not linking such as the startup code, and that's the reason why your second attempt resulted in "no _start section" or something like that.
Could you print the output of the objdump -d command?
Btw, notice that 33872 == 0x8450.
I am not familiar with raspberry PI's memory map, so if you'r following any tutorials about this or have some other resource to help me help you out - it would be great :)
Consider the following Linux kernel dump stack trace; e.g., you can trigger a panic from the kernel source code by calling panic("debugging a Linux kernel panic");:
[<001360ac>] (unwind_backtrace+0x0/0xf8) from [<00147b7c>] (warn_slowpath_common+0x50/0x60)
[<00147b7c>] (warn_slowpath_common+0x50/0x60) from [<00147c40>] (warn_slowpath_null+0x1c/0x24)
[<00147c40>] (warn_slowpath_null+0x1c/0x24) from [<0014de44>] (local_bh_enable_ip+0xa0/0xac)
[<0014de44>] (local_bh_enable_ip+0xa0/0xac) from [<0019594c>] (bdi_register+0xec/0x150)
In unwind_backtrace+0x0/0xf8 what does +0x0/0xf8 stand for?
How can I see the C code of unwind_backtrace+0x0/0xf8?
How to interpret the panic's content?
It's just an ordinary backtrace, those functions are called in reverse order (first one called was called by the previous one and so on):
unwind_backtrace+0x0/0xf8
warn_slowpath_common+0x50/0x60
warn_slowpath_null+0x1c/0x24
ocal_bh_enable_ip+0xa0/0xac
bdi_register+0xec/0x150
The bdi_register+0xec/0x150 is the symbol + the offset/length there's more information about that in Understanding a Kernel Oops and how you can debug a kernel oops. Also there's this excellent tutorial on Debugging the Kernel
Note: as suggested below by Eugene, you may want to try addr2line first, it still needs an image with debugging symbols though, for example
addr2line -e vmlinux_with_debug_info 0019594c(+offset)
Here are two alternatives for addr2line. Assuming you have the proper target's toolchain, you can do one of the following:
Use objdump:
locate your vmlinux or the .ko file under the kernel root directory, then disassemble the object file :
objdump -dS vmlinux > /tmp/kernel.s
Open the generated assembly file, /tmp/kernel.s. with a text editor such as vim. Go to
unwind_backtrace+0x0/0xf8, i.e. search for the address of unwind_backtrace + the offset. Finally, you have located the problematic part in your source code.
Use gdb:
IMO, an even more elegant option is to use the one and only gdb. Assuming you have the suitable toolchain on your host machine:
Run gdb <path-to-vmlinux>.
Execute in gdb's prompt: list *(unwind_backtrace+0x10).
For additional information, you may checkout the following resources:
Kernel Debugging Tricks.
Debugging The Linux Kernel Using Gdb
In unwind_backtrace+0x0/0xf8 what the +0x0/0xf8 stands for?
The first number (+0x0) is the offset from the beginning of the function (unwind_backtrace in this case). The second number (0xf8) is the total length of the function. Given these two pieces of information, if you already have a hunch about where the fault occurred this might be enough to confirm your suspicion (you can tell (roughly) how far along in the function you were).
To get the exact source line of the corresponding instruction (generally better than hunches), use addr2line or the other methods in other answers.
As the title suggests, is there any way to read the machine code instructions as/after they have been executed? For example, if I had an arbitrary block of C code and I wanted to know what instructions were compiled and executed when that block was entered then would there be a way to do that? Thank you in advance for any pointers on the subject.
Edit: Some motivation as to what I'm trying to do: I want to have a program that roughly figures out how it has been compiled or what instructions it is currently running without actually needing to know how the machine code is made. I.e. I want to use the hard work that some compiler previously did in compiling a program so that I can copy and later use the machine code being executed.
Little-known fact: GDB has a curses interface built in.
Use gdbtui or gdb,Ctrl+X,Ctrl+A to enter, then Ctrl+X,2 to start showing assembly and source together. The current instruction is highlighted, and you can navigate using the arrow keys.
Almost every debugger can do this.
For gdb, a useful trick to remember is: display/i $pc
Do that once, and then set a breakpoint on a function, the step through the function with stepi and nexti.
The instruction at the PC will be automatically displayed each time.
Ross-Harveys-MacBook-Pro:so ross$ cat > deb.c
int main(void) { return (long)main + 0x123; }
Ross-Harveys-MacBook-Pro:so ross$ cc -O deb.c
Ross-Harveys-MacBook-Pro:so ross$ gdb -q a.out
Reading symbols for shared libraries .. done
(gdb) break main
Breakpoint 1 at 0x100000f30
(gdb) display/i $pc
(gdb) r
Starting program: /Users/ross/so/a.out
Reading symbols for shared libraries +. done
Breakpoint 1, 0x0000000100000f30 in main ()
1: x/i $pc 0x100000f30 <main+4>: lea -0xb(%rip),%rax # 0x100000f2c <main>
(gdb) stepi
0x0000000100000f37 in main ()
1: x/i $pc 0x100000f37 <main+11>: add $0x123,%eax
(gdb) stepi
0x0000000100000f3c in main ()
1: x/i $pc 0x100000f3c <main+16>: leaveq
I can't tell if you're asking about doing this at runtime, or if you want to see a textfile containing the assembly code of your compiled C code.
If the former, just use a debugger (use disassemble in gdb with gcc, or the integrated debugger in Microsoft Visual Studio).
If the latter, you'll have to look up the specific commands for your compiler. With Visual Studio, for example, just use the flag /FAs; this will output the assembly code with your source code. For gcc:
gcc -c -g -Wa,-a,-ad foo.c > foo.lst
Most debuggers have options to view the disassembly of the code you are executing.
Ex: in gdb use the disassemble command.
If you want to know what the execution path was for a particular function, perhaps some processors have such a feature, but generally no. Now what you can do is run in an emulator and modify the emulator to print out the addresses of the fetches or reads or whatever.
If this is just a disassembly question using the gcc/binutils tools objdump -D filename > out.list and not bother executing or using a debugger