Segmentation fault when trying to fprintf after executing machine code with jit - c

devs! Could you help me? The project's goal is to translate byte code from a fictional architecture, generating an array of real machine code and make it run with jit, but I get a segmentation fault when I try to save a certain part of the output on a file. Part of the code responsible for this:
uint32_t length = sysconf(4096);
void * memory = mmap(0 , length , PROT_NONE , MAP_PRIVATE | MAP_ANONYMOUS , -1 , 0);
//{machine array receives the translated machine code here...}
mprotect ( memory , length , PROT_WRITE ) ;
// copying the machine code array to the memory
memcpy ( memory , ( void *) ( machine ) , sizeof ( machine ) ) ;
mprotect ( memory , length , PROT_EXEC ) ;
uint32_t length = sysconf(4096);
const uint32_t (* jit ) (int32_t*, uint8_t*) = ( uint32_t (*) (int32_t*, uint8_t*) ) ( memory );
// running the machine code to produce de outputs
// &R is the array of registers to store the output and &mem contains the original byte
// code to receive inputs from a instruction that changes the original code
(*jit)((int *)&R, (unsigned char *)&mem);
munmap(memory,length);
// printf/fprintf that causes the segmentation fault if we try to print n and ic[n]
// n = 0; - does not work to print the correct starting value for n
// fflush(stdout); - works to print the correct starting value for n
for(n = 0; n < 16; n++) {
// fprintf(output,"%02x:\n",n);
// fprintf(output,":%d\n",ic[n]);
fprintf(output,"%02x:%d\n",n,ic[n]);
// printf("%02x:%d\n",n,ic[n]);
// fflush(stdout);
}
for (k = 0; k < 16; k++) {
fprintf(output,"R[%d]=0x%08x\n",k,R[k]);
}
The original byte code translated to instructions on this pseudo-assembly code. On this code, the R's represent and array of registers that is passed to the real assembly code R0 is %rdi, R1 = %rdi+0x4,..., R15 = %rdi+0x3C.
Some of those pseudo-instructions translate to one or more actual assembly instructions, and [Rn] represents the memory location which contains the byte code for the original architecture. So when it access [Rn], it uses the current value for the register as the position to get the next 4 bytes (an instruction on the fantasy architecture is 4 bytes long).
mov R0, 0x006C
mov R1, 0x0001
mov R2, [R0]
cmp R15, R2
je 0x0030
mov R14, R2
jg 0x0000
jl 0x0000
add R13, R14
and R12, R13
or R11, R12
xor R10, R11
shl R10, 0x00
shr R10, 0x00
sub R2, R1
mov [R0], R2
jmp 0xFFC8
mov R1, 0x0004
add R0, R1
mov R2, [R0]
add R0, R1
mov R4, [R0]
add R0, R1
mov R8, [R0]
add R0, R1
mov R12, [R0]
jmp 0x0004
mov R3, R12
or R6, R13
add R10, R8
sub R15, R14
For the original architecture instructions (00 to 0F) and 16 registers (R[0] to R[15], the output should follow the model:
original instruction opcode: number of times executed
array of registers: value stored. Something like this:
00:2
01:1
...
0e:1
0f:1
R[0]=0x0000006c
R[1]=0x00000001
...
R[13]=0x03885533
R[14]=0x03885533
R[15]=0x00000000
The problem is that I keep getting a segmentation fault when I try to save the opcode: number of executions. If I try to print only the "opcode:" and register:value pairs, there's no segmentation fault, but instead of printing the first opcode value as "0:", it prints "6C:" which is the R[0] and the r12 (asm register) according to the gdb:
I have tried to insert the push rbp, mov rbp, rsp before the assembly code and the pop rbp, ret after the assembly, but nothing works. Any ideas that could help? Any more infos that I could provide?
Thanks for the help and have a good day.

Related

Writing in the Data Area of ARM

I am writing a bubble sort in ARM Assembly on the STM32.
AREA Data1,DATA,READWRITE
ARR DCD 1,3,2,4
How can I write a value to the array in the data area? I am using LDR,=ARR to load the address of ARR. However, when I check the memory, the values are all zero.
AREA main, CODE, READONLY
EXPORT __main
__main
MOV R4,#4; LENGTH OF ARRAY
MOV R5,#3; NUMBER OF ITERATIONS
MOV R6,#3; NUMBER OF COMPARISONS
LDR R0,=ARR; ADDRESS OF ARR[0] FROM DATA AREA
__LOOP1
CMP R5,#0
BEQ __done
CMP R6,#0
BEQ __L2;
LDR R2,[R0,#4]; R2=ARR[i+1]
LDR R1,[R0];
MOV R3,R1;
CMP R1,R2;
BLE __L1;
STR R2,[R1];
STR R3,[R0];
__L1
SUBS R6,#1;
ADDS R0,#4;
B __LOOP1;
__L2
SUBS R5,#1;
MOV R6,R5;
LDR R0,=ARR;
B __LOOP1;
__done
AREA A,DATA,READWRITE
ARR DCD 1,3,2,4
END

What happens when a 64-bit value is passed as a parameter in a function on a 32-bit architecture?

I am facing a weird issue. I am passing a uint64_t offset to a function on a 32-bit architecture(Cortex R52). The registers are all 32-bit.
This is the function. :
Sorry, I messed up the function declaration.
void read_func(uint8_t* read_buffer_ptr, uint64_t offset, size_t size);
main(){
// read_buffer : memory where something is read.
// read_buffer_ptr : points to read_buffer structure where value should be stored after reading value.
read_func(read_buffer_ptr, 0, sizeof(read_buffer));
}
In this function, the value stored in offset is not zero but some random values which I also see in the registers(r5, r6). Also, when I use offset as a 32-bit value, it works perfectly fine. The value is copied from r2,r3 into r5,r6.
Can you please let me know why this could be happening? Are registers not enough?
The prototype posted is invalid, it should be:
void read_func(uint8_t *read_buffer_ptr, uint64_t offset, size_t size);
Similarly, the definition main() is obsolete: the implicit int return type is not supported as of c99, the function call has another syntax error with a missing )...
What happens when you pass a 64-bit argument on a 32-bit architecture is implementation defined:
either 8 bytes of stack space are used to pass the value
or 2 32-bit registers are used to pass the least significant part and the most significant part
or a combination of both depending on the number of arguments
or some other scheme appropriate for the target CPU
In your code you pass 0 which has type int and presumably has only 32 bits. This is not a problem if the prototype for read_func was correct and parsed before the function call, otherwise the behavior is undefined and a C99 compiler should not even compile the code, but may compilers will just issue a warning and generate bogus code.
In your case (Cortex R52), the 64-bit argument is passed to read_func in registers r2 and r3.
Cortex-R52 has 32 bits address bus and offset cannot be 64 bits. In calculations only lower 32bits will be used as higher ones will not have any effect.
example:
uint64_t foo(void *buff, uint64_t offset, uint64_t size)
{
unsigned char *cbuff = buff;
while(size--)
{
*(cbuff++ + offset) = size & 0xff;
}
return offset + (uint32_t)cbuff;
}
void *z1(void);
uint64_t z2(void);
uint64_t z3(void);
uint64_t bar(void)
{
return foo(z1(), z2(), z3());
}
foo:
push {r4, lr}
ldr lr, [sp, #8] //size
ldr r1, [sp, #12] //size
mov ip, lr
add r4, r0, r2 // cbuff + offset calculation r3 is ignored as not needed - processor has only 32bits address space.
.L2:
subs ip, ip, #1 //size--
sbc r1, r1, #0 //size--
cmn r1, #1
cmneq ip, #1
bne .L3
add r0, r0, lr
adds r0, r0, r2
adc r1, r3, #0
pop {r4, pc}
.L3:
strb ip, [r4], #1
b .L2
bar:
push {r0, r1, r4, r5, r6, lr}
bl z1
mov r4, r0 // buff
bl z2
mov r6, r0 // offset
mov r5, r1 // offset
bl z3
mov r2, r6 // offset
strd r0, [sp] // size passed on the stack
mov r3, r5 // offset
mov r0, r4 // buff
bl foo
add sp, sp, #8
pop {r4, r5, r6, pc}
As you see resister r2 & r3 contain the offset, r0 - buff and size is on the stack.

How to push variable in sdcc inline assembly?

I have this code in ThreadCreate():
int tmpPSW = newThID << 3;
__asm
PUSH A
PUSH _tmpPSW
__endasm;
This results in:
?ASlink-Warning-Undefined Global '_tmpPSW' referenced by module 'cooperative'
I don't get why. tmpPSW is clearly defined, but sdcc complains. What am I doing wrong here? Is there other way to push C variable in sdcc inline assembly?
Also this is probably relevant. The .asm file generated contains allocation info:
;------------------------------------------------------------
;Allocation info for local variables in function 'ThreadCreate'
;------------------------------------------------------------
;fp Allocated to registers
;newThMask Allocated to registers r6 r7
;newThID Allocated to registers r5
;startSP Allocated to registers r3 r4
;tmp Allocated to registers
;tmpPSW Allocated to registers
;------------------------------------------------------------
Does this mean I run out of registers? If so how should I mitigate this?
EDIT:
Source of ThreadCreate():
// ThreadID is typedef'ed as char
ThreadID ThreadCreate(FunctionPtr fp) {
if (activeTh == 0b1111)
return -1;
// i.e. get rightmost bit 0 in bitmask
// https://stackoverflow.com/a/42747608/6306190
int newThMask = ~activeTh & (activeTh + 1);
activeTh |= newThMask;
ThreadID newThID = 0;
while (newThMask >>= 1) { newThID++; }
int startSP = (newThID ^ (1UL << 2)) << 4;
int tmp = SP;
SP = startSP;
int tmpPSW = newThID << 3;
__asm
PUSH DPL ;; push _fp (argument passed in as DPTR in SDCC)
PUSH DPH ;; push _fp
MOV A, #0
PUSH A ;; ACC
PUSH A ;; B
PUSH A ;; DPL
PUSH A ;; DPH
PUSH _tmpPSW ;; PSW
__endasm;
savedSP[newThID] = SP;
SP = tmp;
return newThID;
}
Generated assembly of ThreadCreate():
;------------------------------------------------------------
;Allocation info for local variables in function 'ThreadCreate'
;------------------------------------------------------------
;fp Allocated to registers
;newThMask Allocated to registers r6 r7
;newThID Allocated to registers r5
;startSP Allocated to registers r3 r4
;tmp Allocated to registers
;tmpPSW Allocated to registers
;------------------------------------------------------------
; cooperative.c:104: ThreadID ThreadCreate(FunctionPtr fp) {
; -----------------------------------------
; function ThreadCreate
; -----------------------------------------
_ThreadCreate:
; cooperative.c:110: if (activeTh == 0b1111)
mov a,#0x0f
cjne a,_activeTh,00121$
clr a
cjne a,(_activeTh + 1),00121$
sjmp 00122$
00121$:
sjmp 00102$
00122$:
; cooperative.c:111: return -1;
mov dpl,#0xff
ret
00102$:
; cooperative.c:119: int newThMask = ~activeTh & (activeTh + 1);
mov a,_activeTh
cpl a
mov r6,a
mov a,(_activeTh + 1)
cpl a
mov r7,a
mov a,#0x01
add a,_activeTh
mov r4,a
clr a
addc a,(_activeTh + 1)
mov r5,a
mov a,r4
anl ar6,a
mov a,r5
anl ar7,a
; cooperative.c:157: activeTh |= newThMask;
mov a,r6
orl _activeTh,a
mov a,r7
orl (_activeTh + 1),a
; cooperative.c:160: while (newThMask >>= 1) { newThID++; }
mov r5,#0x00
00103$:
mov ar3,r6
mov a,r7
mov c,acc.7
rrc a
xch a,r3
rrc a
xch a,r3
mov r4,a
mov ar6,r3
mov ar7,r4
mov a,r3
orl a,r4
jz 00105$
inc r5
sjmp 00103$
00105$:
; cooperative.c:161: int startSP = (newThID ^ (1UL << 2)) << 4;
mov ar3,r5
mov r4,#0x00
mov r6,#0x00
xrl ar3,#0x04
mov a,r4
swap a
anl a,#0xf0
xch a,r3
swap a
xch a,r3
xrl a,r3
xch a,r3
anl a,#0xf0
xch a,r3
xrl a,r3
mov r4,a
; cooperative.c:163: int tmp = SP;
mov r7,_SP
; cooperative.c:164: SP = startSP;
mov _SP,r3
; cooperative.c:176: __endasm;
PUSH DPL ;; push _fp (argument passed in as DPTR in 390)
PUSH DPH ;; push _fp
MOV A, #0
PUSH A ;; ACC
PUSH A ;; B
PUSH A ;; DPL
PUSH A ;; DPH
PUSH _tmpPSW ;; PSW
; cooperative.c:178: savedSP[newThID] = SP;
mov a,r5
add a,r5
add a,#_savedSP
mov r1,a
mov r4,_SP
mov r6,#0x00
mov #r1,ar4
inc r1
mov #r1,ar6
; cooperative.c:179: SP = tmp;
mov _SP,r7
; cooperative.c:180: return newThID;
mov dpl,r5
; cooperative.c:181: }
ret
The compiler optimized your variable away, because it was never used from the view of the compiler. Compare the generated assembly with the source to see this.
You might try other options. Because I don't have SDCC installed, I can just suggest:
Make the variable volatile.
Make the variable static, since one flavor of PUSH works with an address in the internal RAM.
Because inline assembly is heavily depending on the surrounding code and the compiler, you could also use newThID in the assembly part, and do the shift there.
Note 1: The generated assembly demonstrates how slow machine code gets if you use int without thinking. Limit your variables to the smallest appropriate data type.
Note 2: Don't make the variable global. static does what you want without exposing the variable globally: Place it in RAM so it can be accessed.

Accessing certain elements of an array in arm assembler

I have a problem which is bugging me for multiple days now...
I call a function from c which is implemented in arm assembler on a raspberry pi using the neon module. The signature looks like the following:
void doStuff(const uint32_t key[4])
I can load all the values into d-registers using VLD4.32 {d6-d9}, [r0].
The problem is that I have to use a value at a certain index of the array which is calculated at runtime. So I have to access the array at an index which I only know at runtime.
In c, the code I want to achieve would look like this:
// calculations
int i = ... // 'i' is the index of value in the array
int result = key[i];
In assembler I tried this:
VMOV r8, s22 ;# copy the calculated index into an arm register
MOV r8, r8, LSL #0x2;# multiply with 4
ADD r8, r5, r8 ;# add offset to base adress
VLDR.32 d14, [r8] ;# load from adress into d-register
I also tried multiplying with 2 and 32 instead of 4. But I always get the value 3.
I got it working with this stupid and very slow solution:
;# <--- very slow and ugly --->
VLD4.32 {d6-d9}, [r1] ;# load 4x32bit from adress *r1
VMOV r6, s22 ;# r6 now contains the offset which is either 0,1,2 or 3
CMP r6, #0x0 ;# 3 - 0 == 0 -> Z set
BEQ equal0
CMP r6, #0x1
BEQ equal1
CMP r6, #0x2
BEQ equal2
VMOV d12, d9 ;# has to be 3
B continue
equal0:
VMOV d12, d6
B continue
equal1:
VMOV d12, d7
B continue
equal2:
VMOV d12, d8
B continue
continue:
;# <--- --->
I basically have an if for every possible number and then select the corresponding register.
Thanks!
Edit:
Okay it works with VLD1.32 d14, [r8]. Do not quite unterstand why it won't work with VLDR.32, though.

ARM Division by 10 Save remainder and quotient

This is a question on homework from CS2400 MSU Denver
Hello,
I have a program that reads keys from the user until they have either entered a non HEX character or entered a max of 8 HEX characters. As keys are entered I maintain a sum of hex values being entered by the user by multiplying the sum register by 16 and adding the new hex value.
This part is all fine and dandy, no help needed. I am having trouble taking this final result, in HEX, and converting it to DEC. I know I need to divide by 10 only I don't know how I can accomplish this.
Please help me determine how to divide by 10 and save the quotient and remainder. Thanks.
AREA HW6, CODE
ENTRY
Divsor EQU 10
MAIN
MOV R1, #0 ; Clear register to be used as symbols received counter
MOV R2, #0 ; Clear register to be used as temp result
LDR R4, =DecStr ; Load address of DecStr
LDR R5, =TwosComp ; Load address of TwosComp
LDR R6, =RvsDecStr
BL READ_CHARS ; Read characters from the keyboard
BL TO_DECIMAL ; Is R2 negative ?
SWI 0x11
READ_CHARS
CMP R1, #8 ; Check if necessary to read another key
BEQ DONE_READ_CHAR ; User has entered 8 hex symbols
SWI 4 ; [R0] <--- Key from keyboard (ASCII)
CMP R0, #'0' ; Verify digit is valid
BLO DONE_READ_CHARS
CMP R0, #'9' ; Verify digit is valid
BHI CHECK_HEX
SUB R0, R0, #'0' ; Obtain Hex equivalent of ASCII char 0-9
B STORE_INPUT
CHECK_HEX
CMP R0, #'A'
BLO DONE_READ_CHARS ; Invalid Hex symbol
CMP R0, #'F'
BHI DONE_READ_CHARS ; Invalid Hex symbol
SUB R0, R0, #'A'
ADD R0, R0, #0xA ; Adding ten to receive Hex equivalent of ASCII A-F
STORE_INPUT
MOV R3, R2, LSL#4 ; *16
ADD R2, R3, R0 ; Add valid Hex symbol to temp result
ADD R1, R1, #1 ; Increase symbol's recieved counter
B READ_CHARS ; Get next key
DONE_READ_CHARS
MOV PC, LR ; Return to BL READ_CHARS ( MAIN )
TO_DECIMAL
TST R2, #2, 2
BEQ POSITIVE
STRB #'-', [R4], #1 ; Store - as first byte in DecStr
MVN R2, R2 ; [R2] <- 1's complement of R2
ADD R2, R2, #1 ; [R2] <- 2's complement of R2
POSITVE
STR R2, [R5] ; Store all entered hex values in memory at TwosComp
LDR R7, [R5] ; Initial quotient
udiv10
LDRB R7, [R5], #1 ; Load a byte of TwosComp
CMP R7, #0
BEQ DONE_TO_DECIMAL
DONE_TO_DECIMAL
MOV PC, LR
AREA data1, DATA
TwosComp
DCD 0
DecStr % 12
RvsDecStr
% 11
ALLIGN
END
You can do it by subtracting-and-shift like elementary division easily. There are also many division algorithms on this site and Google
How does one do integer (signed or unsigned) division on ARM?
Assembly mod algorithm on processor with no division operator
But if you only want to convert from hexadecimal to decimal then double dabble may fit your need. It converts number to packed BCD without any division

Resources