Run Hello World on ARM emulator - c

I'm trying to emulate a ARM-cpu at the moment. I read a lot about how to emulate a cpu. Right now I managed to write down all OpCodes, Registers etc and compiled a Hello World on a ARM-Maschine written in C, to get the assembler code (to test my emulator).
Assembler
.arch armv8-a
.file "hello.c"
.section .text.startup, "ax", #progbits
.align 2
.p2align 3,,7
.global main
.type main, %function
main:
stp x29, x30, [sp, -16]!
adrp x1, .LC0
add x1, x1, :lo12:.LC0
mov w0, 1
add x29, sp, 0
bl __printf_chk
mov w0, 0
ldp x29, x30, [sp], 16
ret
.size main, .-main
.section .rodata.str1.8, "aMS", #progbits, 1
.align 3
.LC0:
.string "Hello World"
.ident "GCC: (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10)"
.section .note.GNU-stack, "", #progbits
I have to load the code from the ROM to my emulated RAM from which the code will be executed. But how do I load it to the RAM? I dont understand how to split the OpCodes and the Registers.
Summary
I want to emulate a ARM-CPU on a Intel CPU with Windows. To test my emulator I wrote a Hello World in C and compiled it to get the assembler code. The assembler code is written on a ARM-Maschine with Ubuntu 16.04. My question is how to fetch the assembler code with my emulator.

Related

Assembling armv8-a neon with gnu assembler

I am trying to assemble aarch64 neon instructions with the gnu assembler. The example is from the neon programming quick reference
.text
.align 4
.global add_float_neon2
.type add_float_neon2, %function
add_float_neon2:
.L_loop:
ld1 {v0.4s}, [x1], #16
ld1 {v1.4s}, [x2], #16
fadd v0.4s, v0.4s, v1.4s
subs x3, x3, #4
st1 {v0.4s}, [x0], #16
bgt .L_loop
ret
When running the gnu assembler I get the following error:
arm-linux-gnueabi-as -march=armv8-a -mfpu=neon test.s
test.s: Assembler messages:
test.s:13: Error: bad instruction `ld1 {v0.4s},[x1],#16'
test.s:15: Error: bad instruction `ld1 {v1.4s},[x2],#16'
test.s:17: Error: bad instruction `fadd v0.4s,v0.4s,v1.4s'
test.s:19: Error: ARM register expected -- `subs x3,x3,#4'
test.s:21: Error: bad instruction `st1 {v0.4s},[x0],#16'
test.s:25: Error: bad instruction `ret'
It cannot assemble any instructions with the <Vd>.<T> operand formats, even if I try running it with a single instruction. What am I doing wrong?

Non-used Reservated Stack in Intel x86 Assembly

I am in the beginning of learning intel's x86 assembly code and compiled this simple "hello world" c program (without the cfi additions for simplicity):
#include
int main(int argc, char* argv[]) {
printf("hello world!");
return 0;
}
The following x86 code came out:
.file "helloworld.c"
.intel_syntax noprefix
.section .rodata
.LC0:
.string "hello world!"
.text
.globl main
.type main, #function
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR -4[rbp], edi
mov QWORD PTR -16[rbp], rsi
lea rdi, .LC0[rip]
mov eax, 0
call printf#PLT
mov eax, 0
leave
ret
.size main, .-main
.ident "GCC: (Debian 7.2.0-19) 7.2.0"
.section .note.GNU-stack,"",#progbits
The question: Why are those 16 bytes for local variables reserved on the stack but aren't used in any way? The program even does the same, without those lines, so for which reason were they created?

How bitwise shift operator is Implemented in C. Is it atomic?

I want to know how bitwise shift operator "<<" and ">>" is implemented in language. Is it atomic or not? Does c shift the whole word at once or move every bit one by one.
Are there any dependencies on the compiler, operating system or computer architecture?
Does C standard defines how shift operator would be implemented?
Example :
let's say two thread are accessing a data. one of them modifies it by shifting 3 bits. so does this 3 bit shift an atomic operation or not? should I use locks to handle this modification?
EDIT: It's only a shift operator, no store instruction. data is already in memory so no load operation.
My processor : Powerpc MPC8569, e600 core architecture.
C only guarantees atomic access for _Atomic type variables, which were introduced in C11.
For all other situations, there are never any guarantees of atomic access. You will have to disassemble to C code to see how many assembler instructions it generated. Typically, one assembler instruction is always atomic.
But your question doesn't make all that much sense, because there is no context. Where would the result of the shift go? Do you plan to store it somewhere? Then that's two operations: shift and store. Possibly also a load. If you write an algorithm which is not atomic in itself, how do you expect the compiler to magically make it atomic for you?
It depends on the processor that you are using.
If an instruction for bitwise shift is present, as is present on most x86 cores and 16 bit and 32 bit microcontrollers, then it is atomic.
If, however you have a 8 bit microcontroller without a bit shift instruction, or you are trying to bit shift a large value (say 64 bits or 128 bits) the instruction may well take quite a lot of code.
It depends upon which standard you are talking about.
AFAIU, the only atomic operations (explicitly defined as atomic) in C11 are the ones related to <stdatomic.h>
You could imagine a TeraHertz processor with a 4 bits ALU; even a simple int32_t addition won't be atomic on it.
I wrote two program
#include<stdio.h>
int main()
{
int i = 5;
return 0;
}
Its assembly code generated for PowerPC architecture Code 1 is
.file "hello.c"
.section ".text"
.align 2
.globl main
.type main, #function
main:
stwu 1,-32(1)
stw 31,28(1)
mr 31,1
li 0,5
stw 0,8(31)
li 0,0
mr 3,0
lwz 11,0(1)
lwz 31,-4(11)
mr 1,11
blr
.size main, .-main
.ident "GCC: (GNU) 4.2.2"
.section .note.GNU-stack,"",#progbits
Second code is
#include<stdio.h>
int main()
{
int i = 5;
i = i<<1;
return 0;
}
My assembly code generated for PowerPC architecture code 2 is
.file "hello.c"
.section ".text"
.align 2
.globl main
.type main, #function
main:
stwu 1,-32(1)
stw 31,28(1)
mr 31,1
li 0,5
stw 0,8(31)
lwz 0,8(31) // extra
slwi 0,0,1 // extra
stw 0,8(31) // extra
li 0,0
mr 3,0
lwz 11,0(1)
lwz 31,-4(11)
mr 1,11
blr
.size main, .-main
.ident "GCC: (GNU) 4.2.2"
.section .note.GNU-stack,"",#progbits
You see there are three extra Instruction, So operation is not atomic
I also compiled this on Intel i7 PC. Here are the results:
assembly code generated for first code is :
.file "hello.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $5, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4"
.section .note.GNU-stack,"",#progbits
Assembly code generated for code 2:
.file "hello.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $5, -4(%rbp)
sall -4(%rbp) // only one extra instruction
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4"
.section .note.GNU-stack,"",#progbits
So, my understanding is that, answer depends on architecture we are using.
Normally it should be SHLD/SHRD - Double Precision Shift (386+)
https://web.itu.edu.tr/kesgin/mul06/intel/instr/shld_shrd.html
I think it's atomic because it's a single instruction. Otherwise you could use atomic or volatile if c supports it, C++11 supports it.

Tell the compiler to translate a certain instruction differently?

I'm using armcc and arm-gcc for compiling my project for an ARM968 processor.
When returning form a function call, the returning instruction is as follows:
Pop {ri-rj, pc}
All the pushed registers are popped in the same instruction. I want to change the instruction above into something like this:
Pop {ri-rj}
Pop {pc}
Can I instruct the assembler or the compiler to adhere to the rule above in any of the ARM tool chains when using Pop?
Depending on the exact nature of the problem, and whether you're critically dependent on a handful of newer instructions, another possibility might be to compile for ARMv4T. Prior to ARMv5T, ldm into the PC was not interworking, so the compiler won't emit that form of return instruction for a v4T target. For some simple test code, compiling with -march=armv5t generates a stack frame thus:
8: e92d4070 push {r4, r5, r6, lr}
...
4c: d8bd8070 pople {r4, r5, r6, pc}
whereas compiling the same thing with -march=armv4t uses the same prologue but with an indirect return sequence (the above was a conditional return from inside a loop, now it's also moved out to the end of the function):
48: da000006 ble 68 <func+0x68>
...
68: e8bd4070 pop {r4, r5, r6, lr}
6c: e12fff1e bx lr
Of course, whether that has the same effect as the two separate pops depends on what the underlying bug in the system is - if it's something like the timing between the data fetches of the ldm and the instruction fetches of the jump itself, then this might be sufficiently equivalent; it could conceivably be something quite different, like a broken memory system truncating AHB bursts above a certain size, so it's the number of registers transferred in a single ldm that's the issue, but in that case I'd also expect to see problems with things like memcpy which aren't so easily fixed.
you can tell gcc to stop processing after it has generated the assembler. The you can edit the assembler files manually or by using sed. Then you pass the assembler file to as from binutils to assemble it into an object file.
$ cat test.c
#include <stdio.h>
int main(int argc, char *argv[])
{
printf ("hello world\n");
return 0;
}
$ gcc test.c -o test.s -S
cat test.s
.file "test.c"
.section .rodata
.LC0:
.string "hello world"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $.LC0, %edi
call puts
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4"
.section .note.GNU-stack,"",#progbits
$ as test.s -o test.o

gcc Assembly code output different than expected

I am new to c and gcc. I'm trying to follow along with an example in Computer Systems: A Programmer's Perspective. The author says that the following code when put into a file (code.c)
int accum = 0;
int sum(int x, int y)
{
int t = x + y;
accum += t;
return t;
}
and using the gcc as follows to output an assembly code file
gcc -O2 -S code.c
will produce assembly code as follows
sum:
pushl %ebp
movl %esp,%ebp
movl 12(%ebp),%eax
addl 8(%ebp),%eax
addl %eax,accum
movl %ebp,%esp
popl %ebp
ret
However on my machine (OS: Ubuntu 10.4 x64) I get the following
.file "code.c"
.intel_syntax noprefix
.text
.p2align 4,,15
.globl sum
.type sum, #function
sum:
.LFB0:
.cfi_startproc
lea eax, [rdi+rsi]
add DWORD PTR accum[rip], eax
ret
.cfi_endproc
.LFE0:
.size sum, .-sum
.globl accum
.bss
.align 4
.type accum, #object
.size accum, 4
accum:
.zero 4
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",#progbits
Why am I seeing this difference?
Because the book is 11 years old and gcc has changed a great deal since it was written.

Resources