x86_64 assembly string manipulation causes segmentation fault - c

I am trying to write a "strcat" function in assembly and can't get the values in the memory I pass it to change. My tests are crashing and I don't understand why. I can't seem to find any good documentation on x86_64 assembly in an easy to digest manner either.
global _ft_strcat
_ft_strcat:
push rbx
push rdx
mov rbx, rsi
mov rdx, rdi
parse:
cmp byte [rdx], 0
je concat
inc rdx
jmp parse
concat:
cmp BYTE[rbx], 0
je finish
mov dl, BYTE[rbx]
mov BYTE[rdx], dl
inc rdx
inc rbx
jmp concat
finish:
mov BYTE[rdx], 0
mov rax, rdi
pop rdx
pop rbx
ret
The above is the function I am trying to write and below is my test.
int main(void)
{
char buffer[50] = "Hello, ";
ft_strcat(buffer, "World!");
printf("%s\n", buffer);
return (0);
}
I left out things such as includes and my header because that is not relevant to the question from what I can see. I ran this through a debugger and noticed that at the end of my function, the string pointed to by the rdi register has not changed, but I do go through the loop in the concat label and it looks like the values being extracted from the string being pointed to by rsi is indeed being copied into the dl register.

Your pushes and pops don't match so your routine changes rbp and rbx contrary to the ABI requirement to preserve them.

My problem was my lack of understand how I was manipulating the rdx register's lowest 8-bits. By inserting my character into dl, its value updated the overall value of rdx, which meant that I was not actually concatenating the string I had, but I was writing into regions of memory I didn't know I was writing into.
The code now looks like this
global _ft_strcat
_ft_strcat:
push rbx
push rdx
push rcx
xor rcx, rcx
mov rbx, rsi
mov rdx, rdi
parse:
cmp byte [rdx], 0
je concat
inc rdx
jmp parse
concat:
cmp BYTE[rbx], 0
je finish
mov cl, BYTE[rbx]
mov BYTE[rdx], cl
inc rdx
inc rbx
jmp concat
finish:
mov BYTE[rdx], 0
pop rcx
pop rdx
pop rbx
mov rax, rdi
ret
You will notice the addition of the Rex register and the use of its lower 8 bits for copying bytes over.

Related

How to push arguments onto stack using MASM x64?

I know the x64 calling convention, first four arguments are in rcx, rdx, r8, r9, rest are on the stack. But my question is how push these arguments?
call_func PROC
push rbp
mov rbp, rsp
mov rbx, rcx ; move C function address to rbx
mov rcx, 1 ; some dummy value
mov rdx, 2 ; some dummy value
mov r8, 3 ; some dummy value
mov r9, 4 ; some dummy value
; and now I want to push fifth argument, but how?
call rbx ; call the function
mov rsp, rbp
pop rbp
ret
call_func ENDP
I have tried mov QWORD PTR [rsp + 20h], 1 but when returning form this asm function the RIP register is set to weird value, like 0x0000000000000001. I know that the RIP register is instruction pointer, but why it is modifying it?
I have tried one more thing, let the function take 6 arguments and when I pass sixth argument like mov QWORD PTR [rsp + 28h], 1 the app is fine, sixth argument is passed, fifth has weird value.
As Jester said, before pushing arguments I need to allocate space for these arguments including shadow space.
Final working code:
call_func PROC
push rbp
mov rbp, rsp
sub rsp, 32 ; allocate shadow space 'padding'
sub rsp, 16 ; allocate space for fifth and sixth argument
mov r11, rcx ; move C function address to r11
mov rcx, 1 ; some dummy value
mov rdx, 2 ; some dummy value
mov r8, 3 ; some dummy value
mov r9, 4 ; some dummy value
mov QWORD PTR [rsp + 20h], 5 ; push fifth argument
mov QWORD PTR [rsp + 28h], 6 ; push sixth argument
call r11 ; call the function
mov rsp, rbp
pop rbp
ret
call_func ENDP

Understanding pointers in assembler from machine's view

Here is a basic program I written on the godbolt compiler, and it's as simple as:
#include<stdio.h>
void main()
{
int a = 10;
int *p = &a;
printf("%d", *p);
}
The results after compilation I get:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-12], 10
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
Question: Pushing the rbp, making the stack frame by making a 16 byte block, how from a register, a value is moved to a stack location and vice versa, how the job of LEA is to figure out the address, I got this part.
Problem:
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
Lea -> getting address of rbp-12 into rax,
then moving the value which is the address of rbp-12 into rax,
but next line again says, move to rax, the value of rbp-8. This seems ambiguous. Then again moving the value of rax to eax. I don't understand the amount of work here. Why couldn't I have done
lea rax, [rbp-12]
mov QWORD PTR [rbp-8], rax
mov eax, QWORD PTR [rbp-8]
and be done with it? coz on the original line, rbp-12's address is stored onto rax, then rax stored to rbp-8. then rbp-8 stored again into rax, and then again rax is stored into eax? couldn't we have just copied the rbp-8 directly to eax? i guess not. But my question is why?
I know there is de-referencing in pointers, so How LEA helps grabbing the address of rbp-12, I understand, but on the next parts, when did it went from grabbing values from addresses I completely lost. And also, after that, I didn't understand any of the asm lines.
You're seeing very un-optimized code. Here's my line-by-line interpretation:
.LC0:
.string "%d" ; Format string for printf
main:
push rbp ; Save original base pointer
mov rbp, rsp ; Set base pointer to beginning of stack frame
sub rsp, 16 ; Allocate space for stack frame
mov DWORD PTR [rbp-12], 10 ; Initialize variable 'a'
lea rax, [rbp-12] ; Load effective address of 'a'
mov QWORD PTR [rbp-8], rax ; Store address of 'a' in 'p'
mov rax, QWORD PTR [rbp-8] ; Load 'p' into rax (even though it's already there - heh!)
mov eax, DWORD PTR [rax] ; Load 32-bit value of '*p' into eax
mov esi, eax ; Load value to print into esi
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
mov eax, 0 ; Zero out eax (not sure why -- likely printf call protocol)
call printf ; Make the printf call
nop ; No-op (not sure why)
leave ; Remove the stack frame
ret ; Return
Compilers, when not optimizing, generate code like this as they parse the code you gave them. It's doing a lot of unnecessary stuff, but it is quicker to generate and makes using a debugger easier.
Compare this with the optimized code (-O2):
.LC0:
.string "%d" ; Format string for printf
main:
mov esi, 10 ; Don't need those variables -- just a 10 to pass to printf!
mov edi, OFFSET FLAT:.LC0 ; Load format string address into edi
xor eax, eax ; It's a few cycles faster to xor a register with itself than to load an immediate 0
jmp printf ; Just jmp to printf -- it will handle the return
The optimizer found that the variables weren't necessary, so no stack frame is created. Nothing is left but the printf call! And that's done as a jmp since nothing else need be done here when the printf is complete.

how to pass a variable by reference to a c function `sscanf()`

the following code tries to read the command line arguments and then scans them with sscanf() and use the result to emit utf8 text.
I'm failing to call sscanf() and getting segfault error at the line where I call this function
I have already debugged this and I know where is the problem but not how to solve it.
global main
extern puts
extern sscanf
extern printf
extern emit_utf_8
section .text
main:
cmp rdi, 2
jl argumentsError
add rsi, 8 ; skip the name of the program
forloop:
mov r12, rsi
push rdi
push rsi
push r12
sub rsp, 8 ; must align stack before call
; start of for bloc
xor rax, rax
mov rdi, qword [r12]
mov rsi, codePointFormat
mov rdx, qword [codePoint]
call sscanf
cmp rax, 1
je ifthen
jmp else
ifthen:
mov rdi, codePoint
call emit_utf_8
jmp endif
else:
mov rdi, incorrectFormat
mov rsi, r12
call printf
endif:
; end of for bloc
add rsp, 8 ; restore %rsp to pre-aligned value
pop r12
pop rsi
pop rdi
add rsi, 8 ; point to next argument
dec rdi ; count down
jnz forloop ; if not done counting keep going
ret
argumentsError: mov rdi, argumentsRequiredMessage
call puts
mov rdi, argumentDescription
call puts
xor rax, rax
inc rax
ret
section .data
argumentsRequiredMessage:
db "This program requires one or more command line arguments,", 0
argumentDescription: db "one for each code point to encode as UTF-8.", 0
incorrectFormat: db "(%s incorrect format)", 0
codePointFormat: db "U+%6X", 0
section .bss
codePoint: resb 8 ; The code point from sscanf should go here.
Is there a way to pass that third argument?
sscanf() signature.
in __cdecl sscanf(const char *const _buffer, const char *const _Format, ...)
I'm using ubuntu 19.04 64 bit

C function calling problems in assembly

section .data
text db 'Put a number',10,0
scanform db '%d'
number dw 0
section .text
extern printf,scanf
global main
main:
push rbp
mov rbp,rsp
push rdi
push rsi
push rbx
mov rdi,text
mov rax,0
call printf
mov rsi,number
mov rdi,scanform
mov rax,0
call scanf
pop rbx
pop rsi
pop rdi
ret
This is my code I write a other codes all day and I do not have problem with these but now when I call scanf, write program received signal SIGSEV, segfault... Specified first and last line in different files. I do not understand this message can someone help me?
You have the following issues:
You forgot to pop rbp.
You misalign the stack which needs to be 16 byte aligned.
You do not zero terminate your format string (thanks to Paul for pointing this out).
You use %d which writes a 4 byte integer but you only allocated 2 bytes with dw.
It is recommended to align integers to 4 bytes.
A possible fixed version:
section .data
number dd 0
text db 'Put a number',10,0
scanform db '%d', 0
section .text
extern printf,scanf
global main
main:
push rbp
mov rbp,rsp
push rdi
push rsi
push rbx
push rbx ; for alignment
mov rdi,text
mov rax,0
call printf
mov rsi,number
mov rdi,scanform
mov rax,0
call scanf
pop rbx
pop rbx
pop rsi
pop rdi
pop rbp
ret
Since rsi and rdi are caller-saved registers and rbx is not touched, you can simplify the code. I also changed to xor zeroing and rip-relative addressing as follows:
section .data
number dd 0
text db 'Put a number',10,0
scanform db '%d', 0
section .text
extern printf,scanf
global main
main:
push rbp
lea rdi, [rel text]
xor eax, eax
call printf
lea rsi, [rel number]
lea rdi, [rel scanform]
xor eax, eax
call scanf
pop rbp
ret

Nasm Assembly: I'm trying to copy a string and then search for a specific word in it, change it, and print it. Stuck on the copying part.

I'm stuck at figuring out to copy the string source to target, which should be initialized to all zeroes. It appears as though I need to find the size of the string, start a counter register, and push stringitem[counter] to the stack, increment counter register. I can't figure out how to even get started, let alone search for a word in the string.
Thanks!
bits 64
global main
extern printf
section .text
main:
; function setup
push rbp
mov rbp, rsp
sub rsp, 32
;
lea rdi, [rel message]
mov al, 0
call printf
;
lea rdi, [rel source]
mov al, 0
call printf
;
;mov edi, source
;mov esi, target
;lea rdi, [esi]
;mov al, 0
;call printf
;mov ecx,sizeof source -1
; mov esi,0
;L1:
; mov eax,source[esi];
; push eax
; inc esi
; loop L1
; function return
mov eax, 0
add rsp, 32
pop rbp
ret
section .data
message: db 'Project',0x0D,0x0a,'Author',0x0D,0x0a,0
source: db 0x0D,0x0a,"I can't figure out how to copy this text to target.",0x0D,0x0a,0
target: db '0000000000000000000000000000000000000000000',0x0D,0x0a,0
For your data memory layout this will do
lea rdi, [rel target]
lea rsi, [rel source]
mov rcx, target-source
cld
rep movsb
Otherwise as Jester said, a simple byte-to-byte copy will also do
lea rdi, [rel target]
lea rsi, [rel source]
cld
.copy:
lodsb
stosb
test al, al
jnz .copy

Resources