In the following code segment:
int func()
{
int a=7;
return a;
}
Is the code segment where the value 7 is stored in the executable? Or is it in data segment or in the code segment? Will the answer depends on the Operating system or the compiler?
Each executable format has some sections. One of them is text, contains the assembly - binary code. One of them is heap where malloc-ed data is found and on is stack where local variables are stored. There are several others but it doesn't matter now. The above three are common everywhere.
Now, local data like your a resides on the stack. In the executable file, the value is stored in the text section.
I've added a main to your code (returning 0), compiled with -g then did objdump -CDgS a.out and searched for 0x424242 (I've replaced your 7 with a value with lesser chance of randomly occurring in code).
00000000004004ec <func>:
int func()
{
4004ec: 55 push %rbp
4004ed: 48 89 e5 mov %rsp,%rbp
int a=0x42424242;
4004f0: c7 45 fc 42 42 42 42 movl $0x42424242,-0x4(%rbp)
return a;
4004f7: 8b 45 fc mov -0x4(%rbp),%eax
}
4004fa: 5d pop %rbp
4004fb: c3 retq
As you see, c7 45 fc 42 42 42 42 means that the value is stored in the generated file. Indeed, this is the case when looking at the binary via xxd:
$ xxd a.out | grep 4242
00004f0: c745 fc42 4242 428b 45fc 5dc3 5548 89e5 .E.BBBB.E.].UH..
You can recognize the above assembly line in the xxd snippet.
Since a is implicitly auto (i.e. is not extern or static) it is stored in the call stack frame.
In fact, the compiler may optimize that: probably, in your case, when optimizing, it will stay in a register (or be constant propagated and constant folded): no need to allocate a call stack slot for your a
This is of course compiler, target platform, and operating system dependent. For the GCC compiler, understand the Gimple internal representation (thru -fdump-tree-all, or using the MELT probe) and look at the generated assembler code (use -fverbose-asm -S -O)
See also this answer which gives a lot of references.
GCC 4.8 on Linux/x86-64 compiles (with gcc -S -fverbose-asm -O) your function into:
.globl func
.type func, #function
func:
.LFB0:
.cfi_startproc
movl $7, %eax #,
ret
.cfi_endproc
.LFE0:
.size func, .-func
So you see that in your particular case no additional space is used for 7, it is directly stored in%eax which is the register (defined in the ABI conventions) to hold its returned result.
The value 7 is stored in the machine code, inside the movl machine instruction. When func is executed, that 7 is loaded into register %eax containing the returned result of func.
Depending on the example code, variable "a" goes in call stack, place to store local variables along with function call information like program counter, return addr etc
Related
I have a simple function in C language, in separate file string.c:
void var_init(){
char *hello = "Hello";
}
compiled with:
gcc -ffreestanding -c string.c -o string.o
And then I use command
objdump -d string.o
to see disassemble listing. What I got is:
string.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <var_init>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # b <var_init+0xb>
b: 48 89 45 f8 mov %rax,-0x8(%rbp)
f: 90 nop
10: 5d pop %rbp
11: c3 retq
I lost in understanding this listing. The book "Writing OS from scratch" says something about old disassembly and slightly uncover the mistery, but their listing is completely different and I even not see that data interpreted as code in mine as author says.
In addition to the explaination from #VladfromMoscow, Just thought it might be helpful for the poster to see what happens when you compile to assembly rather than using objdump to see it, as the data can be seen more plainly then (IMO) and the RIP relative addressing may make a bit more sense.
gcc -S x.s
Yields
.file "x.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.globl var_init
.type var_init, #function
var_init:
.LFB0:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rax
movq %rax, -8(%rbp)
nop
popq %rbp
ret
.LFE0:
.size var_init, .-var_init
.ident "GCC: (Alpine 8.3.0) 8.3.0"
.section .note.GNU-stack,"",#progbits
This command
lea 0x0(%rip),%rax
stores the address of the string literal in the register rax.
And this command
mov %rax,-0x8(%rbp)
copies the address from the register rax into the allocated stack memory. The address occupies 8 bytes as it is seen from the offset in the stack -0x8.
This store only happens at all because you compiled in debug mode; it would normally be optimized away. The next thing that happens is that the local vars (in the red-zone below the stack pointer) are effectively discarded as the function tears down its stack frame and returns.
The material you're looking at probably included a sub $16, %rsp or similar to allocate space for locals below RBP, then deallocating that space later; the x86-64 System V ABI doesn't need that in leaf functions (that don't call any other functions); they can just use the read zone. (See also Where exactly is the red zone on x86-64?). Or compile with gcc -mno-red-zone, which you probably want anyway for freestanding code: Why can't kernel code use a Red Zone
Then it restores the saved value of the caller's RBP (which was earlier set up as a frame pointer; notice that space for locals was addressed relative to RBP).
pop %rbp
and exits, effectively popping the return address into RIP
retq
I wrote a C program that just read/write a large array. I compiled the program with command gcc -O0 program.c -o program Out of curiosity, I dissemble the C program with objdump -S command.
The code and assembly of the read_array and write_array functions are attached at the end of this question.
I'm trying to interpret how gcc compiles the function. I used // to add my comments and questions
Take one piece of the beginning of the assembly code of the write_array() function
4008c1: 48 89 7d e8 mov %rdi,-0x18(%rbp) // this is the first parameter of the fuction
4008c5: 48 89 75 e0 mov %rsi,-0x20(%rbp) // this is the second parameter of the fuction
4008c9: c6 45 ff 01 movb $0x1,-0x1(%rbp) // comparing with the source code, I think this is the `char tmp` variable
4008cd: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp) // this should be the `int i` variable.
What I don't understand is:
1) char tmp is obviously defined after int i in write_array function. Why gcc reorder the memory location of these two local variables?
2) From the offset, int i is at -0x8(%rbp) and char tmp is at -0x1(%rbp), which indicates variable int i takes 7 bytes? This is quite weird because int i should be 4 bytes on x86-64 machine. Isn't it? My speculation is that gcc tries to do some alignment?
3) I found the gcc optimization choices are quite interesting. Is there some good documents/book that explain how gcc works? (The third question may be off-topic, and if you think so, please just ignore. I just try to see if there is some short cut to learn the underlying mechanisms gcc uses for compilation. :-) )
Below is the piece of function code:
#define CACHE_LINE_SIZE 64
static inline void
read_array(char* array, long size)
{
int i;
char tmp;
for ( i = 0; i < size; i+= CACHE_LINE_SIZE )
{
tmp = array[i];
}
return;
}
static inline void
write_array(char* array, long size)
{
int i;
char tmp = 1;
for ( i = 0; i < size; i+= CACHE_LINE_SIZE )
{
array[i] = tmp;
}
return;
}
Below is the piece of disassembled code for write_array, from gcc -O0:
00000000004008bd <write_array>:
4008bd: 55 push %rbp
4008be: 48 89 e5 mov %rsp,%rbp
4008c1: 48 89 7d e8 mov %rdi,-0x18(%rbp)
4008c5: 48 89 75 e0 mov %rsi,-0x20(%rbp)
4008c9: c6 45 ff 01 movb $0x1,-0x1(%rbp)
4008cd: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
4008d4: eb 13 jmp 4008e9 <write_array+0x2c>
4008d6: 8b 45 f8 mov -0x8(%rbp),%eax
4008d9: 48 98 cltq
4008db: 48 03 45 e8 add -0x18(%rbp),%rax
4008df: 0f b6 55 ff movzbl -0x1(%rbp),%edx
4008e3: 88 10 mov %dl,(%rax)
4008e5: 83 45 f8 40 addl $0x40,-0x8(%rbp)
4008e9: 8b 45 f8 mov -0x8(%rbp),%eax
4008ec: 48 98 cltq
4008ee: 48 3b 45 e0 cmp -0x20(%rbp),%rax
4008f2: 7c e2 jl 4008d6 <write_array+0x19>
4008f4: 5d pop %rbp
4008f5: c3 retq
Even at -O0, gcc doesn't emit definitions for static inline functios unless there's a caller. In that case, it doesn't actually inline: instead it emits a stand-alone definition. So I guess your disassembly is from that.
Are you using a really old gcc version? gcc 4.6.4 puts the vars in that order on the stack, but 4.7.3 and later use the other order:
movb $1, -5(%rbp) #, tmp
movl $0, -4(%rbp) #, i
In your asm, they're stored in order of initialization rather than declaration, but I think that's just by chance, since the order changed with gcc 4.7. Also, tacking on an initializers like int i=1; doesn't change the allocation order, so that completely torpedoes that theory.
Remember that gcc is designed around a series of transformations from source to asm, so -O0 doesn't mean "no optimization". You should think of -O0 as leaving out some things that -O3 normally does. There is no option that tries to make a literal-as-possible translation from source to asm.
Once gcc does decide which order to allocate space for them:
the char at rbp-1: That's the first available location that can hold a char. If there was another char that needed storing, it could go at rbp-2.
the int at rbp-8: Since the 4 bytes from rbp-1 to rbp-4 isn't free, the next available naturally-aligned location is rbp-8.
Or with gcc 4.7 and newer, -4 is the first available spot for an int, and -5 is the next byte below that.
RE: space saving:
It's true that putting the char at -5 makes the lowest touched address %rsp-5, instead of %rsp-8, but this doesn't save anything.
The stack pointer is 16B-aligned in the AMD64 SysV ABI. (Technically, %rsp+8 (the start of stack args) is aligned on function entry, before you push anything.) The only way for %rbp-8 to touch a new page or cache-line that %rbp-5 wouldn't is for the stack to be less than 4B-aligned. This is extremely unlikely, even in 32bit code.
As far as how much stack is "allocated" or "owned" by the function: In the AMD64 SysV ABI, the function "owns" the red zone of 128B below %rsp (That size was chosen because a one-byte displacement can go up to -128). Signal handlers and any other asynchronous users of the user-space stack will avoid clobbering the red zone, which is why the function can write to memory below %rsp without decrementing %rsp. So from that perspective, it doesn't matter how much of the red-zone we use; the chances of a signal handler running out of stack is unaffected.
In 32bit code, where there's no redzone, for either order gcc reserves space on the stack with sub $16, %esp. (try with -m32 on godbolt). So again, it doesn't matter whether we use 5 or 8 bytes, because we reserve in units of 16.
When there are many char and int variables, gcc packs the chars into 4B groups, instead of losing space to fragmentation, even when the declarations are mixed together:
void many_vars(void) {
char tmp = 1; int i=1;
char t2 = 2; int i2 = 2;
char t3 = 3; int i3 = 3;
char t4 = 4;
}
with gcc 4.6.4 -O0 -fverbose-asm, which helpfully labels which store is for which variable, which is why compiler asm output is preferable to disassembly:
pushq %rbp #
movq %rsp, %rbp #,
movb $1, -4(%rbp) #, tmp
movl $1, -16(%rbp) #, i
movb $2, -3(%rbp) #, t2
movl $2, -12(%rbp) #, i2
movb $3, -2(%rbp) #, t3
movl $3, -8(%rbp) #, i3
movb $4, -1(%rbp) #, t4
popq %rbp #
ret
I think variables go in either forward or reverse order of declaration, depending on gcc version, at -O0.
I made a version of your read_array function that works with optimization on:
// assumes that size is non-zero. Use a while() instead of do{}while() if you want extra code to check for that case.
void read_array_good(const char* array, size_t size) {
const volatile char *vp = array;
do {
(void) *vp; // this counts as accessing the volatile memory, with gcc/clang at least
vp += CACHE_LINE_SIZE/sizeof(vp[0]);
} while (vp < array+size);
}
Compiles to the following, with gcc 5.3 -O3 -march=haswell:
addq %rdi, %rsi # array, D.2434
.L11:
movzbl (%rdi), %eax # MEM[(const char *)array_1], D.2433
addq $64, %rdi #, array
cmpq %rsi, %rdi # D.2434, array
jb .L11 #,
ret
Casting an expression to void is the canonical way to tell the compiler that a value is used. e.g. to suppress unused-variable warnings, you can write (void)my_unused_var;.
For gcc and clang, doing that with a volatile pointer dereference does generate a memory access, with no need for a tmp variable. The C standard is very non-specific about what constitutes access to something that's volatile, so this probably isn't perfectly portable. Another way is to xor the values you read into an accumulator, and then store that to a global. As long as you don't use whole-program optimization, the compiler doesn't know that nothing reads the global, so it can't optimize away the calculation.
See the vmtouch source code for an example of this second technique. (It actually uses a global variable for the accumulator, which makes clunky code. Of course, that hardly matters since it's touching pages, not just cache lines, so it very quickly bottlenecks on TLB misses and page faults, even with a memory read-modify-write in the loop-carried dependency chain.)
I tried and failed to write something that gcc or clang would compile to a function with no prologue (which assumes that size is initially non-zero). GCC always wants to add rsi,rdi for a cmp/jcc loop condition, even with -march=haswell where sub rsi,64/jae can macro-fuse just as well as cmp/jcc. But in general on AMD, what GCC has fewer uops inside the loop.
read_array_handtuned_haswell:
.L0
movzx eax, byte [rdi] ; overwrite the full RAX to avoid any partial-register false deps from writing AL
add rdi, 64
sub rsi, 64
jae .L0 ; or ja, depending on what semantics you want
ret
Godbolt Compiler Explorer link with all my attempts and trial versions
I can get similar if the loop-termination condition is je, in a loop like do { ... } while( size -= CL_SIZE ); But I can't seem to convince gcc to catch unsigned borrow when subtracting. It want to subtract and then cmp -64/jb to detect underflow. It's not that hard to get compilers to check the carry flag after an add to detect carry :/
It's also easy to get compilers to make a 4-insn loop, but not without prologue. e.g. calculate an end pointer (array+size) and increment a pointer until it's greater or equal.
Fortunately this is not a big deal; the loop we do get is good.
For local variable saved in stack, the address order depends in the stack grow direction. you can refer to Does stack grow upward or downward? for more information.
This is quite weird because int i should be 4 bytes on x86-64 machine. Isn't it?
If my memory serve me correctly, the size of int on x86-64 machine is 8. you can confirm it by writing a test application to print sizeof(int).
This is a very strange problem which occurs only when the program is compiled with -fPIC option.
Using gdb I'm able to print thread local variables but stepping over them leads to crash.
thread.c
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#define MAX_NUMBER_OF_THREADS 2
struct mystruct {
int x;
int y;
};
__thread struct mystruct obj;
void* threadMain(void *args) {
obj.x = 1;
obj.y = 2;
printf("obj.x = %d\n", obj.x);
printf("obj.y = %d\n", obj.y);
return NULL;
}
int main(int argc, char *arg[]) {
pthread_t tid[MAX_NUMBER_OF_THREADS];
int i = 0;
for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
pthread_create(&tid[i], NULL, threadMain, NULL);
}
for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
pthread_join(tid[i], NULL);
}
return 0;
}
Compile it using the following: gcc -g -lpthread thread.c -o thread -fPIC
Then while debugging it: gdb ./thread
(gdb) b threadMain
Breakpoint 1 at 0x4006a5: file thread.c, line 15.
(gdb) r
Starting program: /junk/test/thread
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7fc7700 (LWP 31297)]
[Switching to Thread 0x7ffff7fc7700 (LWP 31297)]
Breakpoint 1, threadMain (args=0x0) at thread.c:15
15 obj.x = 1;
(gdb) p obj.x
$1 = 0
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
threadMain (args=0x0) at thread.c:15
15 obj.x = 1;
Although, if I compile it without -fPIC then this problem doesn't occur.
Before anybody asks me why am I using -fPIC, this is just a reduced test case. We have a huge component which compiles into a so file which then plugs into another component. Therefore, fPIC is necessary.
There is no functional impact because of it, only that debugging is near impossible.
Platform Information: Linux 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux, Red Hat Enterprise Linux Server release 6.5 (Santiago)
Reproducible on the following as well
Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:20:27
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
The problem lies deep in the bowels of GAS, the GNU assembler, and how it generates DWARF debug information.
The compiler, GCC, has the responsibility of generating a specific sequence of instructions for a position-independent thread-local access, which is documented in the document ELF Handling for Thread-Local Storage, page 22, section 4.1.6: x86-64 General Dynamic TLS Model. This sequence is:
0x00 .byte 0x66
0x01 leaq x#tlsgd(%rip),%rdi
0x08 .word 0x6666
0x0a rex64
0x0b call __tls_get_addr#plt
, and is the way it is because the 16 bytes it occupies leave space for backend/assembler/linker optimizations. Indeed, your compiler generates the following assembler for threadMain():
threadMain:
.LFB2:
.file 1 "thread.c"
.loc 1 14 0
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
.loc 1 15 0
.byte 0x66
leaq obj#tlsgd(%rip), %rdi
.value 0x6666
rex64
call __tls_get_addr#PLT
movl $1, (%rax)
.loc 1 16 0
...
The assembler, GAS, then relaxes this code, which contains a function call (!), down to just two instructions. These are:
a mov having an fs:-segment override, and
a lea
, in the final assembly. They occupy between themselves 16 bytes in total, demonstrating why the General Dynamic Model instruction sequence is designed to require 16 bytes.
(gdb) disas/r threadMain
Dump of assembler code for function threadMain:
0x00000000004007f0 <+0>: 55 push %rbp
0x00000000004007f1 <+1>: 48 89 e5 mov %rsp,%rbp
0x00000000004007f4 <+4>: 48 83 ec 10 sub $0x10,%rsp
0x00000000004007f8 <+8>: 48 89 7d f8 mov %rdi,-0x8(%rbp)
0x00000000004007fc <+12>: 64 48 8b 04 25 00 00 00 00 mov %fs:0x0,%rax
0x0000000000400805 <+21>: 48 8d 80 f8 ff ff ff lea -0x8(%rax),%rax
0x000000000040080c <+28>: c7 00 01 00 00 00 movl $0x1,(%rax)
So far, everything has been done correctly. The problem now begins as GAS generates DWARF debug information for your particular assembler code.
While parsing line-by-line in binutils-x.y.z/gas/read.c, function void
read_a_source_file (char *name), GAS encounters .loc 1 15 0, the statement that begins the next line, and runs the handler void dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) in dwarf2dbg.c. Unfortunately, the handler does not unconditionally emit debug information for the current offset within the "fragment" (frag_now) of machine code it is currently building. It could have done this by calling dwarf2_emit_insn(0), but the .loc handler currently only does so if it sees multiple .loc directives consecutively. Instead, in our case it continues on to the next line, leaving the debug information unemitted.
On the next line it sees the .byte 0x66 directive of the General Dynamic sequence. This is not, in and of itself, part of an instruction, despite representing the data16 instruction prefix in x86 assembly. GAS acts upon it with the handler cons_worker(), and the fragment increases from 12 bytes to 13 in size.
On the next line it sees a true instruction, leaq, which is parsed by calling the macro assemble_one() that maps to void md_assemble (char *line) in gas/config/tc-i386.c. At the very end of that function, output_insn() is called, which itself finally calls dwarf2_emit_insn(0) and causes debug information to be emitted at last. A new Line Number Statement (LNS) is begun that claims that line 15 began at function-start-address plus previous fragment size, but since we passed over the .byte statement before doing so, the fragment is 1 byte too large, and the computed offset for the first instruction of line 15 is therefore 1 byte off.
Some time later GAS relaxes the Global Dynamic Sequence to the final instruction sequence that starts with mov fs:0x0, %rax. The code size and all offsets remain unchanged because both sequences of instructions are 16 bytes. The debug information is unchanged, and still wrong.
GDB, when it reads the Line Number Statements, is told that the prologue of threadMain(), which is associated with the line 14 on which is found its signature, ends where line 15 begins. GDB dutifully plants a breakpoint at that location, but unfortunately it is 1 byte too far.
When run without a breakpoint, the program runs normally, and sees
64 48 8b 04 25 00 00 00 00 mov %fs:0x0,%rax
. Correctly placing the breakpoint would involve saving and replacing the first byte of an instruction with int3 (opcode 0xcc), leaving
cc int3
48 8b 04 25 00 00 00 00 mov (0x0),%rax
. The normal step-over sequence would then involve restoring the first byte of the instruction, setting the program counter eip to the address of that breakpoint, single-stepping, re-inserting the breakpoint, then continuing the program.
However, when GDB plants the breakpoint at the incorrect address 1 byte too far, the program sees instead
64 cc fs:int3
8b 04 25 00 00 00 00 <garbage>
which is a wierd but still valid breakpoint. That's why you didn't see SIGILL (illegal instruction).
Now, when GDB attempts to step over, it restores the instruction byte, sets the PC to the address of the breakpoint, and this is what it sees now:
64 fs: # CPU DOESN'T SEE THIS!
48 8b 04 25 00 00 00 00 mov (0x0),%rax # <- CPU EXECUTES STARTING HERE!
# BOOM! SEGFAULT!
Because GDB restarted execution one byte too far, the CPU does not decode the fs: instruction prefix byte, and instead executes mov (0x0),%rax with the default segment, which is ds: (data). This immediately results in a read from address 0, the null pointer. The SIGSEGV promptly follows.
All due credits to Mark Plotnick for essentially nailing this.
The solution that was retained is to binary-patch cc1, gcc's actual C compiler, to emit data16 instead of .byte 0x66. This results in GAS parsing the prefix and instruction combination as a single unit, yielding the correct offset in the debug information.
I'm trying to call a simple piece of assembly (as a test for something more complex later), however when I try and run the program it crashes (This program has stopped responding).
main.c:
#include <stdio.h>
#include <stdlib.h>
extern int bar(int param);
int main()
{
int i=8;
i = bar(i);
printf("Hello world! - %i\n",i);
return 0;
}
bar.S
.file "bar.S"
.text
.align 8
.global bar
bar:
add %rdi,1000;
mov %rax,%rdi;
ret;
I'm concerned that it might be something to do with the way my compiler is configured (I'm more used to the hand holding of Visual Studio than dealing with a real environment).
You are using at&t syntax assembly but you are apparently not familiar with it. The simple solution would be to stick .intel_syntax noprefix into bar.S so you can use intel syntax.
At&t syntax uses reversed operand order and different effective address format, among other things. You got a crash because add %rdi, 1000 means add [1000], rdi in intel syntax, that is add the content of rdi to memory location 1000 which is out of bounds. Presumably you wanted to do add $1000, %rdi. To return the value you need to swap the operands of the mov %rax, %rdi too.
This code is incorrect:
add %rdi,1000;
mov %rax,%rdi;
Remember that in AT&T syntax the operand order is source, destination. Also, immediate values should be prefixed by a $. So the code should be:
add $1000,rdi
mov %rdi,%rax
I removed the semicolons since they're not necessary.
Also, since you seem to be compiling for Windows you should be following Microsoft's 64-bit calling convention, not the System V one. So the argument will be in rcx, not in rdi.
start with this
int bar ( int param )
{
return(param);
}
compile separately and link with main, and see what main is doing and passing, note main is using edi not rdi.
Now dissassemble the function above.
0000000000000000 <bar>:
0: 89 f8 mov %edi,%eax
2: c3 retq
edi and eax as well. Also note that this is ATT syntax not intel, so it is backwards the destination is on the right instead of the left.
so make different flavors of our own:
.global bark
bark:
mov %edi,%eax
addl $1000,%eax
retq
.global barf
barf:
addl $1000,%edi
mov %edi,%eax
retq
.global bar
bar:
add $1000,%edi
mov %edi,%eax
retq
assemble and link with main instead of the C version. And
./main
Hello world! - 1008
Basically, whatever compiler you are using, get it to generate similar/simple code which will follow its convention, then mimic that.
Note, I am using gcc not necessarily the same as what you are running, but the process is the same.
I want to overflow the array buffer[100] and I will be passing python script on bash shell on FreeBSD. I need machine code to pass as a string to overflow that buffer buffer[100] and make the program print its hostname to stdout.
Here is the code in C that I tried and gives the host name on the console. :
#include <stdio.h>
int main()
{
char buff[256];
gethostname(buff, sizeof(buff));
printf(""%s", buff);
return 0;
}
Here is the code in assembly that I got using gcc but is longer than I need becuase when I look for the machine code of the text section of the c program it is longer than 100 bytes and I need a machine code for the c program above that is less than 100 bytes.
.type main, #function
main:
pushl %ebp; saving the base pointer
movl %esp, %ebp; Taking a snapshot of the stack pointer
subl $264, %esp;
addl $-8, %esp
pushl $256
leal -256(%ebp), %eax
pushl %eax
call gethostname
addl $16, %esp
addl $-8, %esp
leal -256(%ebp), %eax
pushl %eax
pushl $.LCO
call printf
addl $16, %esp
xorl %eax, %eax
jmp .L6
.p2align 2, 0x90
.L6:
leave
ret
.Lfe1:
.size main, .Lfe1-main
.ident "GCC: (GNU) c 2.95.4 20020320 [FreeBSD]"
A person has already done it on another computer and he has given me the ready made machine code which is 37 bytes and he is passing it in the format below to the buffer using perl script. I tried his code and it works but he doesn't tell me how to do it.
“\x41\xc1\x30\x58\x6e\x61\x6d\x65\x23\x23\xc3\xbc\xa3\x83\xf4\x69\x36\xw3\xde\x4f\x2f\x5f\x2f\x39\x33\x60\x24\x32\xb4\xab\x21\xc1\x80\x24\xe0\xdb\xd0”
I know that he did it on a differnt machine so I can not get the same code but since we both are using exactly the same c function so the size of the machine code should be almost the same if not exactly the same. His machine code is 37 bytes which he will pass on shell to overflow the gets() function in a binary file on FreeBSD 2.95 to print the hostname on stdout. I want to do the same thing and I have tried his machine code and it works but he will not tell me how did he get this machine code. So I am concerned actually about the procedure of getting that code.
OK I tried the methods suggested in the posts here but just for the function gethostname() I got a 130 character of machine code. It did not include the printf() machine code. As I need to print the hostname to console so that should also be included but that will make the machine code longer. I have to fit the code in an array of 100 bytes so the code should be less than 100 bytes.
Can some one write assembly code for the c program above that converts into machine code that is less than 100 bytes?
To get the machine code, you need to compile the program then disassemble. Using gcc for example do something like this:
gcc -o hello hello.c
objdump -D hello
The dump will show the machine code in bytes and the disassembly of that machine code.
A simple example, that is related, you have to understand the difference between an object file and an executable file but this should still demonstrate what I mean:
unsigned int myfun ( unsigned int x )
{
return(x+5);
}
gcc -O2 -c -o hello.o hello.c
objdump -D hello.o
Disassembly of section .text:
00000000 <myfun>:
0: e2800005 add r0, r0, #5
4: e12fff1e bx lr
FreeBSD is an operating system, not a compiler or assembler.
You want to assemble the assembly source into machine code, so you should use an assembler.
You can typically use GCC, since it's smart enough to know that for a filename ending in .s, it should run the assembler.
If you already have the code in an object file, you can use objdump to read out the code segment of the file.
The 37 bytes posted are completely junk.
If run under any version of Windows ( windows 2000 or later ), I believe, that
the "outsb" and "insd" instructions (in an userland program) will cause a fault,
because userland programs are not allowed directly doing port -level I/O.
Since machine code will not end in "vacuum", I added some \x90 -bytes (again NOP) after the posted code. That merely affects the argument of the last rcl -instruction (which in the given code ends prematurely; eg the code posted is not only rubbish, but also ends prematurely).
But, microprocessors do not have their own intelligence, so they will (try to) execute whatever junk code you feed them. And, the code starts with "inc ecx", a stupid move since we do not know what value the ecx had before. Also "shl dword ptr [eax],$58" is a "good"
way to randomly corrupt memory (since value if eax is also unknown).
And, one of them is NOT even valid byte (should be represented as two hexadecimal digits).
The invalid "byte" is \xw3.
I replaced that invalid byte as \x90 ( a NOP, if it is at start of instruction), and got:
00451B51 41 inc ecx
00451B52 C13058 shl dword ptr [eax],$58
00451B55 6E outsb
00451B56 61 popad
00451B57 6D insd
00451B58 652323 and esp,gs:[ebx]
00451B5B C3 ret
// code below is NEVER executed, since the line above does a RET.
00451B5C BCA383F469 mov esp,$69f483a3
00451B61 3690 nop // 36, w3 ????
00451B63 DE4F2F fimul word ptr [edi+$2f]
00451B66 5F pop edi
00451B67 2F das
00451B68 3933 cmp [ebx],esi
00451B6A 60 pushad
00451B6B 2432 and al,$32
00451B6D B4AB mov ah,$ab
00451B6F 21C1 and ecx,eax
00451B71 8024E0DB and byte ptr [eax],$db
00451B75 D09090909090 rcl [eax-$6f6f6f70],1
You get a nice hexdump of the text section of your object file with objdump -s -j .text.
Edited some more details:
You need to find out what the address of the function in your object code is. This is what objdump -t is for. In this case I am looking for the function main in a program "hello".
> objdump -t hello|grep main
> 0000000000400410 g F .text 000000000000002f main
Now I create a hexdump with objdump -s -j .text hello:
400410 4881ec08 010000be 00010000 31c04889 H...........1.H.
400420 e7e8daff ffff4889 e6bff405 400031c0 ......H.....#.1.
400430 e8abffff ff31c048 81c40801 0000c390 .....1.H........
400440 31ed4989 d15e4889 e24883e4 f0505449 1.I..^H..H...PTI
400450 c7c0e005 400048c7 c1500540 0048c7c7 ....#.H..P.#.H..
...
The first row are the addresses. It starts with 400410, the address of the main function, but this may not always be the case. The following 4 rows are 16 bytes of machinecode in hex, the last row are the same 16 bytes of machine code in ASCII. Because a lot of bytes have no representation in ASCII, there are a lot of dots. You need to use the 4 hexadecimal colums: \x48 \x81 \xec....
I have done this on a linux system, but for FreeBSD you can do exactly the same - only the resulting machindecode will be different.