Calling a C main() function from an x86 32-bit assembly _start [duplicate] - c

This question already has answers here:
x86 Linux assembler get program parameters from _start
(1 answer)
Get argv[2] address in assembler x64
(1 answer)
Closed 5 years ago.
I am trying to write a homework assignment which is to:
write a simple Assembly program that all it does is call a C program,
and send to it the command line arguments so that it may run properly
with (argc and argv).
How can this be done? We were given this asm as part of the assignment:
section .text
global _start
extern main
_start:
;;code to setup argc and argv for C program's main()
call main
mov eax,1
int 0x80
So what I want to know is, where are argc and argv located? Also, do I just need to put the pointer to argc in the eax register like when returning a value to a regular C function and the C program will work the rest out?
In the end, after compiling my C program with the following Makefile (as I said, I am new to Assembly and this is the Makefile given to us by the teacher, I do not fully understand it):
%.o: %.asm
nasm -g -O1 -f elf -o $# $<
%.o: %.c
gcc -m32 -g -nostdlib -fno-stack-protector -c -o $# $<
all: lwca
lwca: lwc.o start.o
ld -melf_i386 -o $# $^
Running ./lwca arg1 arg2 should result in argc = 3 and argv[1]=arg1 argc[2]=arg2
ANSWER:
No answer quite solved my problem, in the end the what worked was:
pop dword ecx ; ecx = argc
mov ebx,esp ; ebx = argv
push ebx ; char** argv
push ecx ; int argc
call main

Related

GDB-remote + qemu reports unexpected memory address for static C variable

Remote debugging a code running in Qemu with GDB, based on an os-dev tutorial.
My version is here. The problem only happens when remote-debugging code inside qemu, not when building a normal executable to run directly inside GDB under the normal OS.
Code looks something like this:
#define BUFSIZE 255
static char buf[BUFSIZE];
void foo() {
// Making sure it's all zero.
for (int i = 0; i < BUFSIZE; i++) buf[i] = 0;
// Setting first char:
buf[0] = 'a';
// >> insert breakpoint right after setting the char <<
// Prints 'a'.
printf("%s", buf);
}
If I place a breakpoint at the marked spot and print the buffer with p buf I get random values from random places, seemingly from my code section. If I get the address by p &buf I get something that does not look correct, for two things:
If I do a char* p_buf = buf and I check the address with p p_buf it gives me a totally different address, which is stable across executions (the other was not). Then I inspect that memory section with x /255b 0x____ and I can see the a and then zeros (97 0 0 0 ... 0).
The next command (printf("%s", buf);) does actually prints a.
This leaves me believing it might be GDB not knowing the correct location if I only inspect the static variable.
Where should I start debugging this?
Details about the compile conditions:
Compile flags: -g -Wall -Wextra -pedantic -nostdlib -nostdinc -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32
qemu-system-i386
Gcc: i386 elf target
Example output from GDB:
(gdb) p buf
$1 = "dfghjkl;'`\000\\zxcvbnm,./\000*\000 ", '\000' <repeats 198 times>...
(gdb) p p_buf
$2 = 0x40c0 <buf+224> "a"
(gdb) p &buf
$3 = (char (*)[255]) 0x3fe0 <buf>
(gdb) info address buf
Symbol "buf" is static storage at address 0x3fe0.
Update 2:
Disassembled a version of the code that shows the discrepancy:
; void foo
0x19f1 <foo> push %ebp
0x19f2 <foo+1> mov %esp,%ebp
0x19f4 <foo+3> sub $0x10,%esp
; char* p_buf = char_buf; --> `p &char_buf` is 0x4040 (incorrect) but `p p_buf` is 0x4100
0x19f7 <foo+6> movl $0x4100,-0x4(%ebp)
; void* p_p_buf = (void*)p_buf; --> `p p_p_buf` gives 0x4100
0x19fe <foo+13> mov -0x4(%ebp),%eax
0x1a01 <foo+16> mov %eax,-0x8(%ebp)
; void* p_char_buf = (void*)&char_buf; --> `p p_char_buf` gives 0x4100
0x1a04 <foo+19> movl $0x4100,-0xc(%ebp)
; char_buf[0] = 'a'; --> correct address
0x1a0b <foo+26> movb $0x61,0x4100
; char_buf[1] = 'b'; --> correct address (asking `p &char_buf` here is still incorrectly 0x4040)
0x1a12 <foo+33> movb $0x62,0x4101
; void foo return
0x1a19 <foo+40> nop
0x1a1a <foo+41> leave
0x1a1b <foo+42> ret
My Makefile for building the project looks like:
C_SOURCES = $(wildcard kernel/*.c drivers/*.c)
C_HEADERS = $(wildcard kernel/*.h drivers/*.h)
OBJ = ${C_SOURCES:.c=.o kernel/interrupt_table.o}
CC = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gcc
# GDB = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gdb
GDB = /usr/bin/gdb
CFLAGS = -g -Wall -Wextra -ffreestanding -fno-exceptions -pedantic -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32
QEMU = qemu-system-i386
os-image.bin: boot/boot.bin kernel.bin
cat $^ > $#
kernel.bin: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $# -Ttext 0x1000 $^ --oformat binary
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $# -Ttext 0x1000 $^
kernel.dis: kernel.bin
ndisasm -b 32 $< > $#
run: os-image.bin
${QEMU} -drive format=raw,media=disk,file=$<,index=0,if=floppy
debug: os-image.bin kernel.elf
${QEMU} -s -S -drive format=raw,media=disk,file=$<,index=0,if=floppy &
${GDB} -ex "target remote localhost:1234" -ex "symbol-file kernel.elf" -ex "tui enable" -ex "layout split" -ex "focus cmd"
%.o: %.c ${C_HEADERS}
${CC} ${CFLAGS} -c $< -o $#
%.o: %.asm
nasm $< -f elf -o $#
%.bin: %.asm
nasm $< -f bin -o $#
build: os-image.bin
echo Pass
clean:
rm -rf *.bin *.o *.dis *.elf
rm -rf kernel/*.o boot/*.bin boot/*.o
For me, this doesn't seem to happen:
Breakpoint 1, main () at test65.c:16
16 printf("%s", buf);
(gdb) p buf
$2 = "a", '\000' <repeats 253 times>
Where should I start debugging this?
It seems like there are two things that might go wrong:
1. GDB might be reading from wrong location
I'm not sure what could cause this, but it is easy enough to verify. Check what address p &buf gives you. Then compare it to what you get from p_buf and also to what info address buf shows you.
Note that due to address space layout randomization the address of static variables will change at the point when you start the process. So before run command the address could be e.g. 0x4040 and then change to 0x555555558040 once the code is running:
(gdb) info address buf
Symbol "buf" is static storage at address 0x4040.
(gdb) run
....
Breakpoint 1, main () at test65.c:16
16 printf("%s", buf);
(gdb) p &buf
$1 = (char (*)[255]) 0x555555558040 <buf>
(gdb) info address buf
Symbol "buf" is static storage at address 0x555555558040.
2. GDB is reading correct place, but data is not there yet
It sounds like a typical debugging problem caused by compiler optimizations. For example, the compiler might move the setting of buf[0] = a after the point where your breakpoint lands, though it must set it before printf() gets called. You could try compiling with -O0 to see if it changes anything.
You can also check the disassembly with disas command, to see what has executed up to that point:
(gdb) disas
Dump of assembler code for function main:
0x000055555555517b <+50>: movb $0x61,0x2ebe(%rip) # 0x555555558040 <buf>
=> 0x0000555555555182 <+57>: lea 0x2eb7(%rip),%rsi # 0x555555558040 <buf>
0x0000555555555189 <+64>: lea 0xe74(%rip),%rdi # 0x555555556004
0x0000555555555190 <+71>: mov $0x0,%eax
0x0000555555555195 <+76>: callq 0x555555555050 <printf#plt>
For me the breakpoint lands at the point right after movb sets 0x61 (letter a) to buf.
If you use stepi command until you are at callq printf instruction, you can be sure you see the buffer exactly like printf would see it.
This is an interesting problem. It comes down to the fact that the code generated by LD (linker) for the ELF executable kernel.elf is different from that of the code generated by LD for kernel.bin when using the --oformat binary option. While one would expect these to be the same, they are not.
More simply put these Makefile rules do not produce the same code as you might expect:
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $# -Ttext 0x1000 $^
and
kernel.bin: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $# -Ttext 0x1000 $^ --oformat binary
It appears the difference is in how the linker is aligning the sections when used with and without --oformat binary. The ELF file (and the symbols used for debugging) are seen to be in one place while the binary file that is actually running in QEMU had code and data generated at different offsets.
I hadn't ever observed this issue because I use my own linker scripts and I always generate the binary file from the ELF executable with OBJCOPY rather than using LD to link twice. OBJCOPY can take an ELF executable and convert it to a binary file. The Makefile rules could be amended to look like:
kernel.bin: kernel.elf
i386-elf-objcopy -O binary $^ $#
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $# -Ttext 0x1000 $^
Doing it this way will ensure the binary file that is generated matches what was produced for the ELF executable.

Archive has no index; run ranlib to add one (when linking with a .a containing a MachO64 object file on Linux)

I tried to create a library and test it, but an error occurred.
error code:
./libasm.a: error adding symbols: Archive has no index; run ranlib to add one
collect2: error: ld returned 1 exit status
I compiled it like this.
nasm -f macho64 ft_strlen.s -o ft_strlen.o
ar rcs libasm.a ft_strlen.o
ranlib libasm.a
gcc main.c libasm.a
Below is the source file
;ft_strlen.s
segment .text
global ft_strlen
ft_strlen:
mov rax, 0
jmp count
count:
cmp BYTE [rdi + rax], 0
je exit
inc rax
jmp count
exit:
ret
/*main.c*/
#include <stdio.h>
int ft_strlen(char *str);
int main(void)
{
char *str = "hello world";
printf("%d \n", ft_strlen(str));
}
I am using ubuntu installed on wsl.
What am I doing wrong?
Generate object files for Linux-based operating system (or perhaps more correctly, and ELF64 system) with nasm -f elf64 ft_strlen.s -o ft_strlen.o
For more info nasm -hf to see all valid output formats for nasm -f
Small tip: ranlib command is not needed because ar s is already indexing the library.

Commands to compile ASM file with C program [duplicate]

This question already has an answer here:
32-bit absolute addresses no longer allowed in x86-64 Linux?
(1 answer)
Closed 4 years ago.
with à 64 Linux system and using NASM.
I'm trying too link my ASM (hello.asm) file with a C file (main.c) and compile to a execution file.
I create a ASM file that print "Hello" with printf by using printHello function.
extern printf, exit
section .data
format db "Hello", 10, 0
section .text
global printHello
printHello:
sub rsp, 8
mov rsi, 0x12345677
mov rdi, format
xor rax, rax
call printf
mov rdi, 0
call exit
I create a simple main.c and call my function "printHello" to print "Hello"
#include <stdio.h>
void printHello();
int main()
{
printHello();
}
My command for compile :
$ nasm -f elf64 hello.asm
$ gcc -c main.c
$ gcc -o executable main.o hello.o
$ ./executable
And it prints :
./executable: Symbol `printf' causes overflow in R_X86_64_PC32 relocation
./executable: Symbol `exit' causes overflow in R_X86_64_PC32 relocation
[1] 6011 segmentation fault ./executable
I'm already learning ASM. Is the problem come from my command or my code ?
I resolved the problem by using your #Jester solution :
gcc -no-pie -o executable main.o hello.o
and thanks Ped7g for explanation.

Linking with cygwin

I have a small program that's made of an assembly function and a C function which calls it.
Now the program compiles and works perfectly on a UNIX system but when using the makefile in cygwin i get the following error:
gcc -m32 -g -c -o main.o main.c
gcc -g -m32 -o ass0 main.o myasm.o
main.o: In function main':
/cygdrive/c/ass0/main.c:15: undefined reference to_strToLeet'
collect2: error: ld returned 1 exit status
makefile:3: recipe for target 'ass0' failed
make: *** [ass0] Error 1
code of the main.c file :
#include <stdio.h>
# define MAX_LEN 100 // Maximal line size
extern int strToLeet (char*);
int main(void) {
char str_buf[MAX_LEN];
int str_len = 0;
printf("Enter a string: ");
fgets(str_buf, MAX_LEN, stdin); // Read user's command line string
str_len = strToLeet (str_buf); // Your assembly code function
printf("\nResult string:%s\nNumber of letters converted to Leet: %d\n",str_buf,str_len);
}
start of assembly code:
section .data ; data section, read-write
an: DD 0 ; this is a temporary var
section .text ; our code is always in the .text section
global strToLeet ; makes the function appear in global scope
extern printf ; tell linker that printf is defined elsewhere
strToLeet: ; functions are defined as labels
push ebp ; save Base Pointer (bp) original value
mov ebp, esp ; use base pointer to access stack contents
pushad ; push all variables onto stack
mov ecx, dword [ebp+8] ; get function argument
makefile code :
all: ass0
ass0: main.o myasm.o
gcc -g -m32 -o ass0 main.o myasm.o
main.o: main.c
gcc -m32 -g -c -o main.o main.c
myasm.o: myasm.s
nasm -g -f elf -l ass0list -o myasm.o myasm.s
help would be most appriciated
Solved by user 'tvin' -
Try to modify your prototype to become extern int strToLeet (char*) asm ("strToLeet"); – tivn

Is function declaration essential to C programming?

I used to believe that we should declare a function which is defined in another file before use it, but recently I changed my way of thinking due to an experience of programming. For three files, C and ASM:
main.c
extern test_str;
/*extern myprint*/ --> If I add the line, gcc will report an error: called object ‘myprint’ is not a function
void Print_String() {
myprint("a simple test", test_str);
}
kernel.asm
extern Print_String
[section .text]
global _start
global test_str
test_str dd 14
_start:
call Print_String
jmp $
another.asm
[section .text]
global myprint
myprint:
mov edx, [esp + 8]
mov ecx, [esp + 4]
mov ebx, 1
mov eax, 4
int 0x80
ret
compile
nasm -f elf another.asm -o another.o
gcc -c -g main.c -o main.o
nasm -f elf kernel.asm -o kernel.o
ld -o final main.o kernel.o another.o
result
./final
a simple test
In my view, if I want to use the function myprint in main.c, I should declare it using extern beforehand, because myprint is defined in another file, but the result is exactly opposite. Just as main.c shows above. If I add the line extern myprint, I will get an error. However, without that declaration, I will get the right result. What's more, I didn't define function myprint in main.c, why can I use that function? Shouldn't I declare it beforehand?
When you call a function without a prototype the compiler makes some assumptions and guesses about the parameters of that function. So you should declare it, but declare it as a function:
void myprint(const char *, const char *); /* Or whatever. */
Well, you can use the function myprint, though its not defined function in main.c, with no error. This is because the compiler, while creating the object file fills in a NULL value against the symbol myprint in the object file created.
This NULL value is replaced at all places in the binary with the actual address of the function only during the linking phase. The linker refers to the symbol table across all the object files and resolves the symbol (wherever referred) with the actual address.
Certainly you shall see warnings/errors with the -Werror -Wall options to gcc. Although, you can get more insight using objdump as follows:
objdump -D main.o | less
Hope that helps to clear your doubt.

Resources