I am getting a segfault at the movq (%rsi, %rcx) line.
I know you can't do mem->mem mov, so I did it through a temporary register. (%rsi), %rcx, then in the loop %rcx, (%rdi). Here is my code:
experimentMemset: #memset(void *ptr, int value, size_t num)
#%rdi #%rsi #%rdx
movq %rdi, %rax #sets rax to the first pointer, to return later
.loop:
cmp $0, (%rdx) #see if num has reached 0
je .end
cmpb $0, (%rdi) #see if string has ended also
je .end
movq %rsi, %rdi #copies value into rdi
inc %rdi #increments pointer to traverse string
dec %rdx #decrements the count, aka num
jmp .loop
.end:
ret
As you discovered, RDX holds a size (an integer count), not a pointer. It's passed by value, not by reference.
cmp $0, (%rdx)
compares not the register, but the location pointed by it. It seems that %rdx is used as a counter, so you should compare the register itself.
test %rdx,%rdx ; je count_was_zero
There are other bugs, like checking the contents of the write-only destination for zeros, and not storing %sil into (%rdi). But this was the cause of the segfault in the current version of the question.
Related
I am writing an assembly loop to get the max number in an array. It loops like this:
start_loop:
# Get the current element in the array and move it to %rax
# movz --> (1) b(yte-1), w(ord-2), l(long-4), q(uad-8)
movzwq data_items(,%rdi,2), %rax
# Check if the current element value is zero, if it is, jump to the end
cmp $0, %rax
jz exit
# Increment the array index as we want to continue the loop at the end
inc %rdi
# Compare the current value (rax) to the current max (rbx)
# WARNING: The `cmp` instruction is always backwards with ATT syntax!
# It reads as, "With respect to %rbx, the value of %rax is...(greater|less) than"
# So to see if a > b, do:
# cmp b, a
# jg
# Reference: https://stackoverflow.com/a/26191257/12283168
cmp %rbx, %rax
jge update_value
jmp start_loop
update_value:
mov %rax, %rbx
jmp start_loop
exit:
mov $1, %rax
int $0x80
My question is this part of the comparison code here:
jge update_value
jmp start_loop
update_value:
mov %rax, %rbx
jmp start_loop # <== can I get rid of this part?
Is there a way to not have to specify the jmp start_loop in the update_value section? For example, in a high level language I could do:
while (1) {
if (a > b)
update_value();
// continue
}
And not have to "jump back to while from the update_value function, I could just 'continue'. Is it possible to do something like this in assembly, or am I thinking about this incorrectly?
I currently trying to learn assembly language. But I'm stuck.
Suppose I have this C code:
for ( int i = 100; i > 0; i-- ) {
// Some code
}
Now I want to do the same in assembly language. I tried it like this:
__asm__ ("movq $100, %rax;"
".loop:"
//Some code
"decq %rax;"
"cmpq $0, (%rax);"
"jnz .loop;"
);
Compile and run results in a seg fault. It does not seg fault if I remove the cmpq line. But then of course the program will not terminate.
So basically my question is what am I doing wrong here?
cmpq $0, (%rax)
This instruction will try to read memory at the address in rax.
rax will be 99 the first time. Address 99 is not mapped in, so your program segfaults.
You are intending to compare the value in rax to 0, so remove the parentheses.
cmpq $0, %rax
The following instruction:
cmpq $0, (%rax)
is accessing the memory address specified by the rax register, whose value is 99.
The first memory page is not mapped. That memory address, 99, belongs to the first memory page. Therefore, the access above results in a segmentation fault.
You don't want the indirection, instead you want:
cmpq $0, %rax
That is, you want to compare against the contents of rax, not the contents at the memory address specified by rax.
Consider however optimizing the cmp instruction away:
decq %rax is immediately preceding the cmp $0, %rax instruction, which sets ZF if rax is zero. The conditional jump is then performed based on the state of the ZF flag:
decq %rax
cmpq $0, %rax
jnz .loop
The dec instruction affects the ZF flag (as cmp does), so if decrementing rax results in zero, ZF will be set. You can take advantage of that fact and place jnz directly after dec. You don't need cmp at all:
decq %rax
jnz .loop
Remove parenthesis from %rax to get the value of rax. Adding the parenthesis basically tells the assembler, "hey! rax holds an address, please return the contents in that address".
Therefore,
cmpq $0, %rax
is what you need to do.
The goal of this function is to replicate the strupr C function. It's stuck in an infinite iteration and I cannot figure out why, it doesn't seem to end. The way I'm looking at it is the argv from the command line is sent as a single array of chars, I simply try to access it and determine if it has to be changed and then printed - or simply just printed. It receives a single parameter which is that very same argv value.
void lowerToUpper (char *msg);
.globl lowerToUpper
.type lowerToUpper, #function
lowerToUpper:
mov $0,%r12
lea (%rdi), %rdx
jmp .restartLoop
.restartLoop:
add $8, %rdx
cmp $0,%r13
jne .isValid
ret
.isValid:
cmp $97, %r13
jbe .printValue
jg .changeValue
.changeValue:
#sub $32, %r13
jmp .printValue
.printValue:
mov $format2, %edi
mov %r13, %rsi
mov $0, %eax
call printf
add $1,%rdx
jmp .restartLoop
The goal of this function is to print the highest number in an array. It receives a pointer to an unsigned int (an array of numbers) and it's length.
void printHigher (unsigned int *data, int len);
printHigher:
mov $0,%r14
mov $0,%r15
jmp .Loop
ret
.Loop:
mov (%rdi), %r8
cmp %r8,%rdi
jl .calculateMax
jmp .printMax
.calculateMax:
cmp %r8, %r15
jg .assignMax
add $1,%r14
add $8,(%rdi)
jmp .Loop
.assignMax:
mov %r8, %r15
.printMax:
mov $format, %edi
mov %r15, %rsi
mov $0, %eax
call printf
Getting segmentation fault in the function, due to instruction
movq -8(%rbp), %rax, one before the printf. I can't understand why ?
Note : this is not gcc generated assembly, but by compiler i am writing. Assembly code is almost similar to what gcc generates.
.text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $2, -4(%rbp)
leaq -4(%rbp), %rax
movl %eax, %edi
movb $0, %al
call fcvt2
movl %eax, -4(%rbp)
leaq .LC0(%rip), %rdi
movl -4(%rbp), %esi
movb $0, %al
call printf
leave
ret
.globl fcvt2
.type fcvt2, #function
fcvt2:
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
movq %rdi, -8(%rbp)
leaq .LC1(%rip), %rdi
movq -8(%rbp), %rax
movl (%rax), %esi
movb $0, %al
call printf
movq -8(%rbp), %rax
movl (%rax), %edi
movl %edi, %eax
leave
ret
.section .rodata
.LC1:
.string "It should be : %d\f"
.LC0:
.string "%d\n"
And C Program is :
int fcvt2(int *ip) {
int i;
printf("It should be : %d\f", *ip);
return *ip;
}
void main() {
int i;
i = 2;
i = fcvt2(&i);
printf("%d\n",i);
return;
}
gdb output at fault point:
rax 0xffffdd4c 4294958412
rbx 0x0 0
rcx 0x7ffffff7 2147483639
rdx 0x7ffff7dd3780 140737351858048
rsi 0x7fffffffdd48 140737488346440
rdi 0xffffdd4c 4294958412
rbp 0x7fffffffdd30 0x7fffffffdd30
rsp 0x7fffffffdd00 0x7fffffffdd00
r8 0x0 0
r9 0x9 9
r10 0x7ffff7dd1b78 140737351850872
r11 0x246 582
r12 0x400430 4195376
r13 0x7fffffffde30 140737488346672
r14 0x0 0
r15 0x0 0
rip 0x40059c 0x40059c <fcvt2+20>
eflags 0x10206 [ PF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
movl %eax, %edi in the caller truncates the pointer arg to fcvt2. You actually segfault on mov (%rax),%esi. rax, not the instruction before it like you claimed. (Time for a refresher on your GDB skills?)
leaq -4(%rbp), %rax generated it correctly in %rax, but then your compiler forgot that it was a 64-bit pointer to a 32-bit value. (Ideally you'd want to leaq -4(%rbp), %rdi directly into the arg register.)
Off topic: if you don't need to preserve the upper bytes of EAX, movb $0, %al is less efficient than xor %eax, %eax. I think you're doing this for the x86-64 SysV variadic function convention, and you're right that only %al needs to say how many XMM register args there are, not the whole %eax, so you got that right. But zeroing eax is the most efficient way to zero al. Of course, you don't need to do this at all for non-variadic functions, but your compiler is obviously still in the just-get-it-working phase, so doing it unconditionally isn't a correctness problem; you never need to pass anything else in rax, and function calls are always assumed to clobber rax.
(Also related: Haswell/Skylake partial registers have false dependencies: al isn't renamed separately from rax anymore)
I was given a function in assembly which basically converted uppercase letters to lowercase letters. Here is some of the assembly,
Q1:
pushq %rbp
movq %rsp, %rbp
subq $24, %rsp
movq %rdi, -24(%rbp)
movl $0, -4(%rbp)
movl $0. -8%(%rbp)
jmp .L2
L2:
movl -4(%rbp) %edx
movq -24(%rbp), %rax
addq %rdx, %rax
movzbl (%rax), %eax
testb %al, %al
jne .L4
...
Much of the rest is repetitive but L2 is what really is confusing me. This is my logic so far:
We store param1 into -24(%rbp). We create local1 and local2, set them both to 0 and then jump to L2. I move local1 into %edx, param1 into %rax. Now this is where things get confusing for me,
I was told the following line, addq ended up in local1 being a pointer to param1. I just reasoned add local1 + param1 and store them into %rax. How is that possible?
Next is, movzbl. From my understanding we dereference %rax so we get something like eax = (int) rax.
I was also told to think of it as converting a char to int. Which one is true, how do I know that I'm typecasting? What about if %rax didn't have parentheses around it? Is it an int because it's 4 bytes and %eax is a 32 bit register. Thank you in advance for your help, I'm kind of lost here....
local1 is not a pointer, it's an index (a counter).
That code is doing something like:
void toupper(char* text)
{
int i = 0; /* at rbp-4 */
int j = 0; /* unused, at rbp-8 */
int ch; /* in eax */
while((ch = *(text + i)) != 0)
{
...
}
}
Note that in C pointer arithmetic *(text + i) is of course equivalent to text[i].
Yes, the movzbl is converting an unsigned char to an int you can see that from the instruction name itself: MOVe Zero extended Byte to Long.
The parentheses denote pointer dereferencing.