Returning an int* from ASM

Returning an int* from ASM - c

I'm writing a function in ASM which is supposed to copy the (constant) value 2 into every index of an array declared in .data. My code compiles, but I don't get any output through my C program. Here's the code:
.globl my_func
.globl _my_func
my_func:
_my_func:
movl %esp,%ebp
pushl %ebp
movl $0,%ecx
leal array,%eax
jmp continue
continue:
_continue:
movl $2,array(%ecx,4)
cmpl $1024,%ecx
jne incr
je finish
incr:
_incr:
addl $4,%ecx
jmp continue
finish:
_finish:
popl %ebp
ret
.data
.align 4
array: .fill 1024
It is called from here:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
extern int* my_func();
int main(int argc, const char * argv[])
{
int i = 0;
int* a = my_func();
for(i = 0; i < 1024/4; i++){
printf("%d\n", a[i]);
}
return 0;
}
As mentioned, the program does compile and run, but the main function does not output anything to the terminal. And yes, I know the code isn't optimal -- I'm currently following an introductory course in computer architecture and ASM, and I'm just checking out instructions and data.
I am assembling the code for IA32 on an Intel Mac with OSX10.9, using LLVM5.1
Thanks in advance.

The function prologue where you save the previous frame pointer and set it up for the new stack frame should be:
pushl %ebp
movl %esp,%ebp
Yours is in the opposite order, so when your function returns the caller's frame pointer will be incorrect.

return values are normally in eax, so you need to set eax to the address of the start of the memory you want to return in finish.
fyi: you shouldn't need to declare your label twice, the leading underscore is only needed for public functions you want to access from C

Related

What's the meaning of '#x.x' in assemble copiled by icc?

I compile come simple code with intel icc compiler, and I notice that there are some numbers at the end of each line. I wanna know the meaning.
Just like #3.12 in the following code.
#include <stdio.h>
int main() {
int a = 3, b;
scanf("%d", &b);
a = a + b;
printf("Hello, world! I am %d\n", a);
return 0;
}
...
main:
..B1.1: # Preds ..B1.0
# Execution count [1.00e+00]
..L1:
#3.12
pushl %ebp #3.12
movl %esp, %ebp #3.12
andl $-128, %esp #3.12
...

It is indeed the line and column of the corresponding source code. #3.12 is the opening { of the main function which makes sense since the shown statements are consistent with the start of a function.
If you insert an extra space before the { you will see that the output changes to #3.13; likewise the 3 changes to 4 if you insert an empty line before the main()function.

This is the procedure for preparing the start of a function, also called the function header. Here we hide the return address on the stack and allocate empty space on the stack for the function to work. Pay attention at the end is the reverse process. Here is an example of the same from another compiler:
push ebp
mov ebp, esp
sub esp, 8
...
mov esp, ebp
pop ebp
ret 0

passing string to an external assembly function

I'm trying to learn some assembly.
My goal is to create an external assembly function that is able to read an array of char, cast to int and then execute various operation, just to learn something.
I've done many proofs but i think i'm missing the point
code:
#include <stdio.h>
#define SIZE 5
extern int foo(char array[]);
int main(void){
char array[SIZE]={'0','1','1','0','1'};
printf("GAS said: %c\n", foo(array));
return 0;
}
assembly:
.data
.text
.global foo
foo:
pushl %ebp
movl %esp, %ebp
movl 8(%esp), %eax #saving in eax the pointer of the array
movl (%eax), %eax #saving in eax the first char of the array
popl %ebp
ret
The strange thing for me is here:
when i use, like in this case
printf("GAS said: %c\n", foo(array));
The output is, as expected, GAS said: 0
Based on this, i was expecting also that changing with:
printf("GAS said: %i\n", foo(array));
will output GAS said: 48 but instead i get in return some random address.
Also, in the assembly file, i can't explain why if i try to
cmpl $48, %eax
je LABEL
the jump will never happen.
The only thing i can think of is that there is a problem with the size, since int takes 4B and char only 1B but i'm not so sure.
So, how can i use compare and return an int to main in this case?

Write byte to file with x86 assembler

I want to make a function in assembler to be called from c that will write a byte(char) to the file. Here is how function should look in c:
void writebyte (FILE *f, char b)
{
fwrite(&b, 1, 1, f);
}
And here is the code that will call it:
#include <stdio.h>
extern void writebyte(FILE *, char);
int main(void) {
FILE *f = fopen("test.txt", "w");
writebyte(f, 1);
fclose(f);
return 0;
}
So far I came up with following assembler code:
.global writebyte
writebyte:
pushl %ebp
movl %esp, %ebp #standard params
pushl 12(%ebp) # pushing byte to the stack
pushl $1
pushl $1
pushl 8(%ebp) #file to write
call fwrite
popl %ebp
ret
I keep getting from gdb:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xffa9702c in ?? ()
How do I write such a function in assembly?
EDIT: I am using Ubuntu 16.04

According to cdecl convention, you should push the arguments in reverse order. So f should go first and b should go last. Also the stack should be cleaned up by the caller after calling fwrite().
As noted in the comments, b will be received as value, but we need to pass it to fwrite() as pointer. The pointer will be equal to the value of ebp + 12.
This seems to work for me:
.global writebyte
writebyte:
//create new stack frame
pushl %ebp
movl %esp, %ebp
//push the four arguments to stack (in reverse order)
pushl 8(%ebp)
pushl $1
pushl $1
//get pointer of "b" argument (%ebp+12) and move it to %eax
leal 12(%ebp), %eax
pushl %eax
//call fwrite()
call fwrite
//remove arguments from stack and pop %ebp
leave
ret

Segmentation fault - what is the reason

I'm learning 32bit assembly and I need help with code. I'm trying to put 4 to a table at index 3, which is passed by arguments to assebly code.
.code32
.equ KERNEL, 0x80 # Linux system functions entry
.equ WRITE, 0x04 # write data to file function
.equ EXIT, 0x01 # exit program function
.equ STDOUT, 1
.equ argTab, 8
.equ argLicz, 12
.equ argN, 16
.equ argZakres, 20
.text
.globl przelicz
.type przelicz, #function
przelicz:
pushl %ebp
movl %esp, %ebp
movl $2, %ecx
movl $4, %ebx
movl argTab(%ebp), %edx
movl %ebx, (%edx,%ecx,4)
movl %ebp, %esp
popl %ebp
ret
I execute it with C code:
#include <stdio.h>
int main(){
const static int n = 5;
int tab[n];
int a;
for(a = 0; a < n; ++a){
tab[a] = a;
}
int licz[n];
przelicz(tab, licz, 50, 50);
for(a = 0; a < n; ++a){
//printf("%d ", licz[a]);
}
}
When I run it I get error: Segmentation fault (code dumped). I've read that I'm trying to get access to memory that doesn't exists. How can I solve this?

As I commented above, the problem is that the process is being compiled as a 64-bit process. This is a problem for two reasons:
x64-linux uses a different system call table than x86-linux. Since you aren't calling a direct system call, this probably isn't the mistake - but it's something to be aware of.
For example, write isn't 0x04 in x64-linux, it is 0x01. (See this table for x64-linux system call numbers).
Obviously, x64-linux has larger pointer sizes. So when a 32 bit address is loaded, there is a random 32-bit upper-half of that address that may point to anywhere. This also affects values in a function's stack (they call contain 8-byte offsets, instead of 4) This is mostly likely what was causing the problem in this code.

Trying to smash the stack

I am trying to reproduce the stackoverflow results that I read from Aleph One's article "smashing the stack for fun and profit"(can be found here:http://insecure.org/stf/smashstack.html).
Trying to overwrite the return address doesn't seem to work for me.
C code:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;
//Trying to overwrite return address
ret = buffer1 + 12;
(*ret) = 0x4005da;
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
disassembled main:
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004005b0 <+0>: push %rbp
0x00000000004005b1 <+1>: mov %rsp,%rbp
0x00000000004005b4 <+4>: sub $0x10,%rsp
0x00000000004005b8 <+8>: movl $0x0,-0x4(%rbp)
0x00000000004005bf <+15>: mov $0x3,%edx
0x00000000004005c4 <+20>: mov $0x2,%esi
0x00000000004005c9 <+25>: mov $0x1,%edi
0x00000000004005ce <+30>: callq 0x400564 <function>
0x00000000004005d3 <+35>: movl $0x1,-0x4(%rbp)
0x00000000004005da <+42>: mov -0x4(%rbp),%eax
0x00000000004005dd <+45>: mov %eax,%esi
0x00000000004005df <+47>: mov $0x4006dc,%edi
0x00000000004005e4 <+52>: mov $0x0,%eax
0x00000000004005e9 <+57>: callq 0x400450 <printf#plt>
0x00000000004005ee <+62>: leaveq
0x00000000004005ef <+63>: retq
End of assembler dump.
I have hard coded the return address to skip the x=1; code line, I have used a hard coded value from the disassembler(address : 0x4005da). The intent of this exploit is to print 0, but instead it is printing 1.
I have a very strong feeling that "ret = buffer1 + 12;" is not the address of the return address. If this is the case, how can I determine the return address, is gcc allocating more memory between the return address and the buffer.

Here's a guide I wrote for a friend a while back on performing a buffer overflow attack using gets. It goes over how to get the return address and how to use it to write over the old one:
Our knowledge of the stack tells us that the return address appears on the stack after the buffer you're trying to overflow. However, how far after the buffer the return address appears depends on the architecture you're using. In order to determine this, first write a simple program and inspect the assembly:
C code:
void function()
{
char buffer[4];
}
int main()
{
function();
}
Assembly (abridged):
function:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
leave
ret
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
call function
...
There are several tools that you can use to inspect the assembly code. First, of course, is
compiling straight to assembly output from gcc using gcc -S main.c. This can be difficult to read since there are little to no hints for what code corresponds to the original C code. Additionally, there is a lot of boilerplate code that can be difficult to sift through. Another tool to consider is gdbtui. The benefit of using gdbtui is that you can inspect the assembly source while running the program and manually inspect the stack throughout the execution of the program. However, it has a steep learning curve.
The assembly inspection program that I like best is objdump. Running objdump -dS a.out gives the assembly source with the context from the original C source code. Using objdump, on my computer the offset of the return address from the character buffer is 8 bytes.
This function function takes the return address and increments 7 to it. The instruction that
the return address originally pointed to is 7 bytes in length, so adding 7 makes the return address point to the instruction immediately after the assignment.
In the example below, I overwrite the return address to skip the instruction x = 1.
simple C program:
void function()
{
char buffer[4];
/* return address is 8 bytes beyond the start of the buffer */
int *ret = buffer + 8;
/* assignment instruction we want to skip is 7 bytes long */
(*ret) += 7;
}
int main()
{
int x = 0;
function();
x = 1;
printf("%d\n",x);
}
Main function (x = 1 at 80483af is seven bytes long):
8048392: 8d4c2404 lea 0x4(%esp),%ecx
8048396: 83e4f0 and $0xfffffff0,%esp
8048399: ff71fc pushl -0x4(%ecx)
804839c: 55 push %ebp
804839d: 89e5 mov %esp,%ebp
804839f: 51 push %ecx
80483a0: 83ec24 sub $0x24,%esp
80483a3: c745f800000000 movl $0x0,-0x8(%ebp)
80483aa: e8c5ffffff call 8048374 <function>
80483af: c745f801000000 movl $0x1,-0x8(%ebp)
80483b6: 8b45f8 mov -0x8(%ebp),%eax
80483b9: 89442404 mov %eax,0x4(%esp)
80483bd: c70424a0840408 movl $0x80484a0,(%esp)
80483c4: e80fffffff call 80482d8 <printf#plt>
80483c9: 83c424 add $0x24,%esp
80483cc: 59 pop %ecx
80483cd: 5d pop %ebp
We know where the return address is and we have demonstrated that changing it can affect the
code that is run. A buffer overflow can do the same thing by using gets and inputing the right character string so that the return address is overwritten with a new address.
In a new example below we have a function function which has a buffer filled using gets. We also have a function uncalled which never gets called. With the correct input, we can run uncalled.
#include <stdio.h>
#include <stdlib.h>
void uncalled()
{
puts("uh oh!");
exit(1);
}
void function()
{
char buffer[4];
gets(buffer);
}
int main()
{
function();
puts("program secure");
}
To run uncalled, inspect the executable using objdump or similar to find the address of the entry point of uncalled. Then append the address to the input buffer in the right place so that it overwrites the old return address. If your computer is little-endian (x86, etc.) , you need to swap the endianness of the address.
In order to do this correctly, I have a simple perl script below, which generates the input that will cause the buffer overflow that will overwrite the return address. It takes two arguments, first it takes the new return address, and second it takes the distance (in bytes) from the beginning of the buffer to the return address location.
#!/usr/bin/perl
print "x"x#ARGV[1]; # fill the buffer
print scalar reverse pack "H*", substr("0"x8 . #ARGV[0] , -8); # swap endian of input
print "\n"; # new line to end gets

You need to examine the stack to determine if buffer1+12 is actually the right address to be modifying. This sort of stuff isn't exactly very portable.
I'd probably also place some eye catchers in the code so you can see where the buffers are on the stack in relation to the return address:
char buffer1[5] = "1111";
char buffer2[10] = "2222";

You can figure this out by printing out the stack. Add code like this:
int* pESP;
__asm mov pESP, esp
The __asm directive is Visual Studio specific. Once you have the address of the stack you can print it out and see what is in there. Note that the stack will change when you do things or make calls, so you have to save the whole block of memory at once by first copying the memory at the stack address to an array, then you print out the array.
What you will find is all kinds of garbage having to do with the stack frame and various runtime checks. By default VS will put guard code in the stack to prevent exactly what you are trying to do. If you print out the assembly listing for "function" you will see this. You need to set a compiler switches to turn all this stuff off.

As an alternative to the methods suggested in other answers, you can figure this sort of thing out using gdb. To make the output a bit easier to read, I remove the buffer2 variable, and change buffer1 to 8 bytes so things are more aligned. We will also compile in 32 bit more do make it easier to read the addresses, and turn debugging on(gcc -m32 -g).
void function(int a, int b, int c) {
char buffer1[8];
char *ret;
so let's print the address of buffer1:
(gdb) print &buffer1
$1 = (char (*)[8]) 0xbffffa40
then let's print a bit past that and see what's on the stack.
(gdb) x/16x 0xbffffa40
0xbffffa40: 0x00001000 0x00000000 0xfecf25c3 0x00000003
0xbffffa50: 0x00000000 0xbffffb50 0xbffffa88 0x00001f3b
0xbffffa60: 0x00000001 0x00000002 0x00000003 0x00000000
0xbffffa70: 0x00000003 0x00000002 0x00000001 0x00001efc
Do a backtrace to see where the return address should be pointing:
(gdb) bt
#0 function (a=1, b=2, c=3) at foo.c:18
#1 0x00001f3b in main () at foo.c:26
and sure enough, there it is at 0xbffffa5b:
(gdb) x/x 0xbffffa5b
0xbffffa5b: 0x001f3bbf

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Returning an int* from ASM - c

The function prologue where you save the previous frame pointer and set it up for the new stack frame should be: pushl %ebp movl %esp,%ebp Yours is in the opposite order, so when your function returns the caller's frame pointer will be incorrect.

return values are normally in eax, so you need to set eax to the address of the start of the memory you want to return in finish. fyi: you shouldn't need to declare your label twice, the leading underscore is only needed for public functions you want to access from C

Related

What's the meaning of '#x.x' in assemble copiled by icc?

passing string to an external assembly function

Write byte to file with x86 assembler

Segmentation fault - what is the reason

Trying to smash the stack

Categories

Resources