Char array from C to ASM x64 GAS - c

I've got assessment to use array from C in ASM function.
Figured out that I need to pass address to that array. But how after that access values of array in ASM? (f.e. array[0], array[1] etc.)
C function:
#include <stdio.h>
void asm_function(char *address);
int main() {
char array[] = "Abc";
asm_function(15, array);
return 0;
}
ASM function:
.type asm_function, #function
.section .data
EXIT = 60
EXIT_SUCCESS = 1
BUFF_LENGTH = 512
format: .asciz "%s\n"
.section .bss
.lcomm buffer, BUFF_LENGTH
.section .text
.globl asm_function
asm_function:
movq %rdi, buffer
subq $8, %rsp
movq $0, %rax
movq buffer_lancuch, %rsi
movq $format, %rdi
call printf #prints whole String
exit:
addq $8, %rsp
movq $EXIT, %rax
movq $EXIT_SUCCESS, %rdi
syscall
What I acctually need is to access all of the chars seperately.
Will appreciate all of hints.

Related

How is struct copied from stack to uninitialized data segment in GNU as?

Having this simple c:
#include <stdio.h>
struct foo{
int a;
char c;
};
static struct foo save_foo;
int main(){
struct foo foo = { 97, 'c', };
save_foo = foo;
printf("%c\n",save_foo.c);
}
Here the save_foo variable is in bss segmet, and in the main function, I am trying to "copy" from stack-made variable foo to uninitialized save_foo. So I would expect both elements foo.a and foo.c to be copied into save_foo.a and save_foo.c.
However, the generated assembly:
.text
.local save_foo
.comm save_foo,8,8
.section .rodata
.LC0:
.string "%c\n"
.text
.globl main
.type main, #function
main:
endbr64
pushq %rbp #
movq %rsp, %rbp #,
subq $16, %rsp #,
# a.c:11: struct foo foo = { 97, 'c', };
movl $97, -8(%rbp) #, foo.a
movb $99, -4(%rbp) #, foo.c
# a.c:12: save_foo = foo;
movq -8(%rbp), %rax # foo, tmp86
##################################################################
#MISSING to copy foo.c to save_foo.c yet able to use that value
#movq -4(%rbp), %rcx
#movq %rcx, 4+save_foo(%rip)
##################################################################
movq %rax, save_foo(%rip) # tmp86, save_foo
# a.c:14: printf("%c\n",save_foo.c);
movzbl 4+save_foo(%rip), %eax # save_foo.c, _1
# a.c:14: printf("%c\n",save_foo.c);
movsbl %al, %eax # _1, _2
movl %eax, %esi # _2,
leaq .LC0(%rip), %rdi #,
movl $0, %eax #,
call printf#PLT #
movl $0, %eax #, _9
# a.c:15: }
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 10.2.0-13ubuntu1) 10.2.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
There is only one element (foo.a) copied. But the foo.c is not. How is possible for movzbl 4+save_foo(%rip), %eax to get the right value (99, which is in ASCII 'c'), when that value was not copied? (there is no movl from -4(%rbp) where the value is to 4+save_foo(%rbp) symbol on the bss segment). Shouldn't be the value at 4+save_foo(%rbp) zeroed (when it is uninitialized)?
movq instruction will copy 8 bytes, so the data of entire struct foo is copied here:
movq -8(%rbp), %rax # foo, tmp86
movq %rax, save_foo(%rip) # tmp86, save_foo
movq -8(%rbp), %rax is an 8-byte reload of the whole struct. Note the l vs. q operand-size suffixes, as well as the register names which also indicate operand-size. (Assembly registers in 64-bit architecture)
When you ask GCC to copy a whole object by doing C struct assignment, it uses wider regs, up to 16-byte XMM regs, just like for memcpy. (Or for large-enough things, might insert a call memcpy instead of expanding it inline.)
Your proposed movq %rcx, 4+save_foo(%rip) would store 8 bytes, starting half way through the global, so it would write outside it.
If you wanted to do both halves separately like save_foo.a = foo.a; save_foo.c = foo.c;, you'd use %eax and %ecx, or %ecx twice. (With movl, not movq). Or maybe a movzbl byte load and a movb byte or movl dword store, depending on whether GCC chose to overwrite the padding or not in the destination, like it does when copying the whole object.

What is register %fs:<num> in gnu assmebler?

This simple c code:
file bar.c:
#include <stdio.h>
#define BSIZE 5
typedef struct
{
int count;
int ar[BSIZE];
} foo;
int main()
{
foo f = {.count = 0};
printf("%ld\n",sizeof(foo));
}
which output 24 as size of the struct (5*4 + 4), so it is correct. The gas code is as follows:
.text
.section .rodata
.LC0:
.string "%ld\n"
.text
.globl main
.type main, #function
main:
endbr64
pushq %rbp #
movq %rsp, %rbp #,
subq $32, %rsp #,
# bar.c:12: {
movq %fs:40, %rax # MEM[(<address-space-1> long unsigned int *)40B], tmp85
movq %rax, -8(%rbp) # tmp85, D.2347
xorl %eax, %eax # tmp85
# bar.c:13: foo f = {.count = 0};
movq $0, -32(%rbp) #, f
movq $0, -24(%rbp) #, f
movq $0, -16(%rbp) #, f
# bar.c:14: printf("%ld\n",sizeof(foo));
movl $24, %esi #,
leaq .LC0(%rip), %rdi #,
movl $0, %eax #,
call printf#PLT #
movl $0, %eax #, _5
# bar.c:15: }
movq -8(%rbp), %rdx # D.2347, tmp86
subq %fs:40, %rdx # MEM[(<address-space-1> long unsigned int *)40B], tmp86
je .L3 #,
call __stack_chk_fail#PLT #
.L3:
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 10.2.0-13ubuntu1) 10.2.0"
.section .note.GNU-stack,"",#progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
Now I have multiple question about this output:
why is there subq $32, %rsp when the size of struct is 24? Why is not substracted just 24 from the stack, but it needs another 8 bytes for what? alignment?
what is movq %fs:40, %rax # MEM[(<address-space-1> long unsigned int *)40B], tmp85 ? What is register %fs? What does the 40 mean, offset ? What the comment generated from compiler suggest? There is no datatype long unsigned int * ???
this statements:
# bar.c:13: foo f = {.count = 0};
movq $0, -32(%rbp) #, f
movq $0, -24(%rbp) #, f
movq $0, -16(%rbp) #, f
I do not fully understand. From my struct definition, I guess
-32(%rbp) == count
-24(%rbp) == ar[0]
-20(%rbp) == ar[1]
-16(%rbp) == ar[2]
-12(%rbp) == ar[3]
-8(%rbp) == ar[4]
Is this correct alignment of struct foo in stack? How otherwise is it align?

Understanding pointer assignment in x86-64 Assembly Code

I am trying to understand assembly code. I am stuck in the portion where the pointer is assigned and the code after leaq command
This is my C code:
#include <stdio.h>
#include<stdlib.h>
int main(){
int x=50;
int *y=&x;
return 0;
}
This is my corresponding ASSEMBLY code:
.file "AssemlyCode.c"
.def __main; .scl 2; .type 32; .endef
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $48, %rsp
.seh_stackalloc 48
.seh_endprologue
call __main
movl $50, -12(%rbp)
leaq -12(%rbp), %rax
movq %rax, -8(%rbp)
movl $0, %eax
addq $48, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (GNU) 5.4.0"
leaq -8(%rbp), %rax
movl %eax, -4(%rbp)
movl $0, %eax
addq $48, %rsp
popq %rbp
ret
leaq saves address of variable x on the stack to register rax. Variable x is automatic variable on the stack, hence it address is calculated as offset from register that holds stack frame pointer(rbp).
movl eax to stack saves argc argument to the stack.
next step is to put return value in eax register from main function(return 0)
two next opcodes are function epilogue - you are cleaning up used stack and restore previous frame pointer register.
and the last one instruction is simple return.

Memory Allocation of Static String Literals

Consider the following struct:
struct example_t {
char * a;
char * b;
};
struct example_t test {
"Chocolate",
"Cookies"
};
I am aware of the implementation specific nature of the allocation of memory for the char*'s, but what of the string literals?
In this case, are there any guarantee from the C-standard with regards to the adjacent placement of "Chocolate" and "Cookies"?
In most implementations I tested the two literals are not padded, and are directly adjacent.
This allows the struct to be copied quickly with a memcpy, although I suspect this behavior is undefined. Does anyone have any information on this topic?
In your example, there are no absolute guarantees of the adjacency/placement of the two string literals with respect to each other. GCC in this case happens to demonstrate such behavior, but it has no obligation to exhibit this behavior.
In this example, we see no padding, and we can even use undefined behavior to demonstrate adjacency of string literals. This works with GCC, but using alternate libc's or different compilers, you could get other behavior, such as detecting duplicate string literals across translation units and reducing redundancy to save memory in the final application.
Also, while the pointers you declared are of type char *, the literals actually should be const char*, since they will be stored in RODATA, and writing to that memory will cause a segfault.
Code Listing
#include <stdio.h>
#include <string.h>
struct example_t {
char * a;
char * b;
char * c;
};
int main(void) {
struct example_t test = {
"Chocolate",
"Cookies",
"And milk"
};
size_t len = strlen(test.a) + strlen(test.b) + strlen(test.c) + ((3-1) * sizeof(char));
char* t= test.a;
int i;
for (i = 0; i< len; i++) {
printf("%c", t[i]);
}
return 0;
}
Sample output
./a.out
ChocolateCookiesAnd milk
Output of gcc -S
.file "test.c"
.section .rodata
.LC0:
.string "Chocolate"
.LC1:
.string "Cookies"
.LC2:
.string "And milk"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $72, %rsp
.cfi_offset 3, -24
movq $.LC0, -48(%rbp)
movq $.LC1, -40(%rbp)
movq $.LC2, -32(%rbp)
movq -48(%rbp), %rax
movq %rax, %rdi
call strlen
movq %rax, %rbx
movq -40(%rbp), %rax
movq %rax, %rdi
call strlen
addq %rax, %rbx
movq -32(%rbp), %rax
movq %rax, %rdi
call strlen
addq %rbx, %rax
addq $2, %rax
movq %rax, -64(%rbp)
movq -48(%rbp), %rax
movq %rax, -56(%rbp)
movl $0, -68(%rbp)
jmp .L2
.L3:
movl -68(%rbp), %eax
movslq %eax, %rdx
movq -56(%rbp), %rax
addq %rdx, %rax
movzbl (%rax), %eax
movsbl %al, %eax
movl %eax, %edi
call putchar
addl $1, -68(%rbp)
.L2:
movl -68(%rbp), %eax
cltq
cmpq -64(%rbp), %rax
jb .L3
movl $0, %eax
addq $72, %rsp
popq %rbx
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4"
.section .note.GNU-stack,"",#progbits
No, there is no guarantee for adjacent placement.
One occasion where actual compilers will place them far apart is if the same string literal appears in different places (as read-only objects) and the string combining optimization is enabled.
Example:
char *foo = "foo";
char *baz = "baz";
struct example_t bar = {
"foo",
"bar"
}
may well end up in memory as "foo" followed by "baz" followed by "bar".
Here is an example demonstrating a real-world scenario where the strings are not adjacent. GCC decides to reuse the string "Chocolate" from earlier.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const char *a = "Chocolate";
const char *b = "Spinach";
struct test_t {
const char *a;
const char *b;
};
struct test_t test = {"Chocolate", "Cookies"};
int main(void)
{
printf("%p %p\n", (const void *) a, (const void *) b);
printf("%p %p\n", (const void *) test.a, (const void *) test.b);
return EXIT_SUCCESS;
}
Output:
0x400614 0x40061e
0x400614 0x400626
I'll try to show you an example of gcc behaviour where, even in that case you don't get strings aligned in memory:
#include <stdio.h>
#include <stdlib.h>
char *s = "Cookies";
struct test {
char *a, *b, *c, *d;
};
struct test t = {
"Chocolate",
"Cookies",
"Milk",
"Cookies",
};
#define D(x) __FILE__":%d:%s: " x, __LINE__, __func__
#define P(x) do{\
printf(D(#x " = [%#p] \"%s\"\n"), x, x); \
} while(0)
int main()
{
P(t.a);
P(t.b);
P(t.c);
P(t.d);
return 0;
}
In this case, as the compiler tries to reuse already seen string literals, the ones you use to assign to the structure fields don't get aligned.
This is the output of the program:
$ pru3
pru3.c:25:main: t.a = [0x8518] "Chocolate"
pru3.c:26:main: t.b = [0x8510] "Cookies"
pru3.c:27:main: t.c = [0x8524] "Milk"
pru3.c:28:main: t.d = [0x8510] "Cookies"
As you see, the pointers are even repeated for the "Cookies" value.
The compiling here was made with default values, with:
gcc -o pru3 pru3.c

Where string data is stored?

I wrote a small c program:
#include <stdio.h>
int main()
{
char s[] = "Hello, world!";
printf("%s\n", s);
return 0;
}
which compiles to (on my linux machine):
.file "hello.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $1819043144, -32(%rbp)
movl $1998597231, -28(%rbp)
movl $1684828783, -24(%rbp)
movw $33, -20(%rbp)
leaq -32(%rbp), %rax
movq %rax, %rdi
call puts
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L3
call __stack_chk_fail
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2"
.section .note.GNU-stack,"",#progbits
I don't understand the assembly code, but I can't see anywhere the string message. So how the executable know what to print?
It's here:
movl $1819043144, -32(%rbp) ; 1819043144 = 0x6C6C6548 = "lleH"
movl $1998597231, -28(%rbp) ; 1998597231 = 0x77202C6F = "w ,o"
movl $1684828783, -24(%rbp) ; 1684828783 = 0x646C726F = "dlro"
movw $33, -20(%rbp) ; 33 = 0x0021 = "\0!"
In this particular case the compiler is generating inline instructions to generate the literal string constant before calling printf. Of course in other situations it may not do this but may instead store a string constant in another section of memory. Bottom line: you can not make any assumptions about how or where the compiler will generate and store string literals.
The string is here:
movl $1819043144, -32(%rbp)
movl $1998597231, -28(%rbp)
movl $1684828783, -24(%rbp)
This copies a bunch of values to the stack. Those values happen to be your string.
string constants are stored in the binary of your application. Exactly where is up to your compiler.
Assembly has no "string" concept. Thus, the "string" is actually a chunk of memory. The string is stored somewhere in memory (up to the compiler) then you can manipulate this chunk of data using its memory address (pointer).
If your string is constant, compiler might want to use it as constants instead of storing it into memory, which is faster. This is your case, as pointed out by Paul R:
movl $1819043144, -32(%rbp)
movl $1998597231, -28(%rbp)
movl $1684828783, -24(%rbp)
You cannot make assumptions about how the compiler will treat your string.
In addition to the above, the compiler can see that your string literal cannot be referenced directly (i.e. there can't be any valid pointers to your string), which is why it can just copy it inline. If however you assign a character pointer instead, i.e.
char *s = "Hello, world!";
The compiler will initialise a string literal somewhere in memory, since you can of course now point to it. This modification produces on my machine:
.LC0:
.string "Hello, world!"
.text
.globl main
.type main, #function
One assumption can be made about string literals: if a pointer is initialised to a literal, it will point to a static char array held somewhere in memory. As a result the pointer is valid in any part of the program, e.g. you can return a pointer to a string literal initialised in a function, and it will still be valid.

Resources