I assemble a simple c program to mips and try to understand the assembly code. By comparing with c code, I almost understand the it but still get some problems.
I use mips-gcc to generate assembly code: $ mips-gcc -S -O2 -fno-delayed-branch -I/usr/include lab3_ex3.c -o lab3_ex3.s
Here is my guess about how the assembly code works:
main is the entry of the program.
$6 is the address of source array.
$7 is the address of dest array.
$3 is the size of source array.
$2 is the variable k and is initialized to 0.
$L3 is the loop
$5 and $4 are addresses of source[k] and dest[k].
sw $3,0($5) is equivalent to store source[k] in $3.
lw $3,4($4) is equivalent to assign source[k] to dest[k].
addiu $2,$2,4 is equivalent to k++.
bne $3, $0, $L3 means that if source[k] is zero then exits the loop otherwise jump to lable $L3.
$L2 just do some clean up work.
Set $2 to zero.
Jump to $31 (return address).
My problems is:
What .frame $sp,0,$31 does?
Why lw $3,4($4) instead of lw $3,0($4)
What is the notation%lo(source)($6) means? ($hi and $lo$ registers are used in multiply so why they are used here?)
Thanks.
C
int source[] = {3, 1, 4, 1, 5, 9, 0};
int dest[10];
int main ( ) {
int k;
for (k=0; source[k]!=0; k++) {
dest[k] = source[k];
}
return 0;
}
Assembly
.file 1 "lab3_ex3.c"
.section .mdebug.eabi32
.previous
.section .gcc_compiled_long32
.previous
.gnu_attribute 4, 1
.text
.align 2
.globl main
.set nomips16
.ent main
.type main, #function
main:
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
lui $6,%hi(source)
lw $3,%lo(source)($6)
beq $3,$0,$L2
lui $7,%hi(dest)
addiu $7,$7,%lo(dest)
addiu $6,$6,%lo(source)
move $2,$0
$L3:
addu $5,$7,$2
addu $4,$6,$2
sw $3,0($5)
lw $3,4($4)
addiu $2,$2,4
bne $3,$0,$L3
$L2:
move $2,$0
j $31
.end main
.size main, .-main
.globl source
.data
.align 2
.type source, #object
.size source, 28
source:
.word 3
.word 1
.word 4
.word 1
.word 5
.word 9
.word 0
.comm dest,40,4
.ident "GCC: (GNU) 4.4.1"
Firstly, main, $L3 and $L2 are labels for 3 basic blocks. You are roughly correct about their functions.
Question 1: What is .frame doing
This is not a MIPS instruction. It is metadata describing the (stack) frame for this function:
The stack is pointed to by $sp, an alias for $29.
and the size of the stack frame (0, since the function has neither local variables, nor arguments on the stack). Further, the function is simple enough that it can work with scratch registers and does not need to save callee-saved registers $16-$23.
the old return address ($31 for MIPS calling convention)
For more information regarding the MIPS calling convention, see this doc.
Question 2: Why lw $3,4($4) instead of lw $3,0($4)
This is due to an optimization of the loop. Normally, the sequence of loads and stores would be :
load source[0]
store dest[0]
load source[1]
store dest[1]
....
You assume that the loop is entirely in $L3, and that contains load source[k] and store dest[k]. It isn't. There are two clues to see this:
There is a load in the block main which does not correspond to any load outside the loop
Within the basic block $L3, the store is before the load.
In fact, load source[0] is performed in the basic-block named main. Then, the loop in the basic block $L3 is store dest[k];load source[k+1];. Therefore, the load uses an offset of 4 more than the offset of the store, because it is loading the integer for the next iteration.
Question 3: What is the lo/hi syntax?
This has to do with instruction encodings and pointers. Let us assume a 32-bit architecture, i.e. a pointer is 32 bits. Like most fixed-size instruction ISAs, let us assume that the instruction size is also 32 bits.
Before loading and storing from the source/dest arrays, you need to load their pointers into registers $6 and $7 respectively. Therefore, you need an instruction to load a 32-bit constant address into a register. However, a 32-bit instruction must contain a few bits to encode opcodes (which operation the instruction is), destination register etc. Therefore, an instruction has less than 32 bits left to encode constants (called immediates). Therefore, you need two instructions to load a 32-bit constant into a register, each loading 16 bits. The lo/hi refer to which half of the constant is loaded.
Example: Assume that dest is at address 0xabcd1234. There are two instructions to load this value into $7.
lui $7,%hi(dest)
addiu $7,$7,%lo(dest)
lui is Load Upper immediate. It loads the top 16 bits of the address of dest (0xabcd) into the top 16 bits of $7. Now, the value of $7 is 0xabcd0000.
addiu is Add Immediate Unsigned. It adds the lower 16 bits of the address of dest (0x1234) with the existing value in $7 to get the new value of $7. Thus, $7 now holds 0xabcd0000 + 0x1234 = 0xabcd1234, the address of dest.
Similarly, lw $3,%lo(source)($6) loads from the address pointed to by $6 (which already holds the top 16 bits of the address of source) at an offset of %lo(source) (the bottom 16 bits of that address). Effectively, it loads the first word of source.
Related
I build a 32-bit RISC-V CPU with Harvard architecture and I want to write programs for it in C. I have a RISC-V compiler set (https://xpack.github.io/riscv-none-embed-gcc/) that can do just that and works fine - for most things. The problem starts when I want to work with global variables, global arrays, etc, because these types get copied to RAM on boot/reset by the start script.
Here is a block diagram of my CPU: (This will be important later. Just note that the Instruction memory = FLASH and Data memory = RAM)
(If you are interested about my CPU, I made a video about it: https://www.youtube.com/watch?v=KzSaFFpBPDM)
Example:
A typical program will look something like this:
#include <stdint.h>
int static_var_1 = 2;
int static_var_2 = 4;
int main(void)
{
int var = static_var_1 + static_var_2;
}
And its objdump something like this:
/opt/xpack-riscv-none-embed-gcc-10.1.0-1.1/riscv-none-embed/bin/objdump build/APP.elf -D
build/APP.elf: file format elf32-littleriscv
Disassembly of section .text:
00000000 <_start>:
0: 00080137 lui sp,0x80
4: ffc10113 addi sp,sp,-4 # 7fffc <_estack>
8: 00c000ef jal ra,14 <main>
c: 0040006f j 10 <_exit>
00000010 <_exit>:
10: 0000006f j 10 <_exit>
00000014 <main>:
14: fe010113 addi sp,sp,-32
18: 00812e23 sw s0,28(sp)
1c: 02010413 addi s0,sp,32
20: 00002703 lw a4,0(zero) # 0 <_start>
24: 00402783 lw a5,4(zero) # 4 <static_var_2>
28: 00f707b3 add a5,a4,a5
2c: fef42623 sw a5,-20(s0)
30: 00000793 li a5,0
34: 00078513 mv a0,a5
38: 01c12403 lw s0,28(sp)
3c: 02010113 addi sp,sp,32
40: 00008067 ret
Disassembly of section .data:
00000000 <static_var_1>:
0: 0002 c.slli64 zero
...
00000004 <static_var_2>:
4: 0004 0x4
...
Disassembly of section ._user_heap_stack:
00000008 <._user_heap_stack>:
...
(the <_start> is a part of my start script that will initialize stack pointer)
The catch:
These are the two instructions that tries to load the global variables:
20: 00002703 lw a4,0(zero) # 0 <_start>
24: 00402783 lw a5,4(zero) # 4 <static_var_2>
But there is a problem - they were never put into RAM, so the CPU will most likely end up with some garbage data, which is unacceptable.
The solution?
Somebody suggested linker relaxation as part of my previous question (RISC-V: Global variables), again, that doesn't seem to be the case, but I can still be wrong though!
From my research, most of the "classic" CPUs use a start script, where the copying takes place, but as this is not a von-neuman architecture, I don't have the FLASH memory mapped to data memory and therefor cannot be read by the program (see the block diagram above). The output program must contain the variables already decoded as executable instructions, for example if we want value 0x4 in RAM at position 0x0, It can be decoded to:
addi t0, zero, 0x4
sw t0, 0(zero)
Re-building my CPU as von-neuman would require much more gates and ICs and this is a discrete build where every IC counts.
Doing it by hardware is for me the worst solution as I stated above, so if it can be done in software I'm all for it - and It can! Obviously, there is a solution, but by far the ugliest: Compile the code, extract the data (with python), generate a new startup script with these variables decoded by the python script and compile it again.
I really don't want to go that route, so if it can be done by modifying startup script, linker, etc, it would be really, really great.
AVR ICs are basically Harvard architecture (though modified) so do they something differently that we can learn from?
I am wondering that in assembly (in my case Y86), is it possible to have an array inside of an array? And if it is, how would I access the elements inside of that array. I know you dereference arrays to get their elements, but that's only with one array in a stack. Is there a way to get an element inside of an array inside of an array.
Example because that's challenging to explain:
Normal grab of an element:
array1:
.long 0x0
.long 0x0
.long 0x0
.long 0x0
Main:
pushl %ebp
rrmovl %esp,%ebp
irmovl array1,%edx #store array1 on the stack
pushl %edx
mrmovl (%edx), %eax #get the first element of array1
rrmovl %ebp, %esp
popl %ebp
ret
Now say I have this:
array1:
.long 0x0
.long 0x0
.long 0x0
.long 0x0
array2:
.long array1
Am I able to access array2 element one and then access array1's elements?
The pushl %edx does not store the array to the stack, but the memory address of first element.
In your other example the first element of array2 is 32 bit integer value, which is equal to memory address of array1, so in C language terms the array2 is array of pointers.
When you fetch first element of array2 into some register, you have "pointer" (memory address) in it, and by fetching value from that address you will fetch first element of array1 (or you can modify it by some offset to fetch further elements).
This "array of pointers to arrays" pattern is often used when you have several arrays of same/similar type with different lengths and you want to store them continuously in memory, for example:
array0:
.long 1, 2, 3
array1:
.long 4
array2:
.long 5, 6, 7, 8
array3:
.long 9, 10
; notice the values in memory form a continuous block of 10 long values,
; so you can also access them through "array0" as single array of 10 values
mainArray:
.long array0, array1, array2, array3
Now if you want value "[2, 3]", i.e. the value "8", you can't simply multiply row value 2 by "column size" like in the matrix16x16 example, because rows don't have fixed length, so you will instead calculate offset into mainArray first, like (I will use x86 AT&T syntax, because I don't know Y86, but you should be able to get the idea, as they are basically the same instructions, just Y86 has more limited instruction set and you have more verbose and cryptic syntax with more prefix/suffix parts of instruction name):
; edi = 2 (row), esi = 3 (column)
movl mainArray(, %edi, 4), %ebx ; ebx = mainArray[2] (*4 because pointers are 32 bit)
; (in x86 AT&T syntax the target memory address is "edi*4 + mainArray")
; here ebx = array2 (and "array2" is symbolic name for memory address value)
; it is NOT whole array in single register, just the memory address of first element
movl (%ebx, %esi, 4), %eax ; eax = 8 (array2[3]) (again *4 because longs are used)
; (the target memory address is "ebx + esi*4")
Sorry for not using y86, but as I said, I don't know it... If you have hard time to decipher the x86 example, try to describe your difficulties in comment, I may try eventually to fix the syntax to y86, or maybe somebody else will suggest fixes...
Am I able to access array2 element one and then access array1's elements?
Yes, of course, those values are just ordinary 32 bit integers (memory addresses too, on your y86 platform), so you can of course fetch that address of sub-array from top array, and then fetch the value from that sub-array address to reach "value". Try to check in debugger, how the memory looks after defining your array, and how those values represent your original source code.
The assembly is sort of so simple and trivial, that it is quite difficult/tedious to write complex abstractions in it, but as long as we are talking about single instruction or memory access, expect the thing to be super simple. If you see some complexity there, you are probably misunderstanding what is happening under the hood, it's all just 0/1 bit values, and moving them around a bit (usually in common quantities like 8, 16, 32 or 64, for other group of bits you need often several instructions to get the desired result, while these above are natively supported as byte/short/long/...). The complexity comes from the how-to-write-that-algorithm with simple copy/plus/minus instructions only.
Currently I am working with a RISC-V processor implementation. I need to run partially hand-crafted assembly code. (Finally there will be dynamic code injection.) For this purpose I have to understand the basics of function calls within RISC-V assembly.
I found this topic very helpful: confusion about function call stack
But I am still struggling with the stack layout for a function call. Please consider the following c-code:
void some_func(int a, int b, int* c){
int cnt = a;
for(;cnt > 0;cnt--){
*c += b;
}
}
void main(){
int a = 5;
int b = 6;
int c = 0;
some_func(a,b,&c);
}
This program implements a basic multiplication by a sequence of additions. The derived assembly code (riscv64-unknown-elf-gcc -nostartfiles mul.c -o mul && riscv64-unknown-elf-objdump -D mul) looks like this:
0000000000010000 <some_func>:
10000: fd010113 addi sp,sp,-48
10004: 02813423 sd s0,40(sp)
10008: 03010413 addi s0,sp,48
1000c: fca42e23 sw a0,-36(s0)
10010: fcb42c23 sw a1,-40(s0)
10014: fcc43823 sd a2,-48(s0)
10018: fdc42783 lw a5,-36(s0)
1001c: fef42623 sw a5,-20(s0)
10020: 0280006f j 10048 <some_func+0x48>
10024: fd043783 ld a5,-48(s0)
10028: 0007a703 lw a4,0(a5)
1002c: fd842783 lw a5,-40(s0)
10030: 00f7073b addw a4,a4,a5
10034: fd043783 ld a5,-48(s0)
10038: 00e7a023 sw a4,0(a5)
1003c: fec42783 lw a5,-20(s0)
10040: fff7879b addiw a5,a5,-1
10044: fef42623 sw a5,-20(s0)
10048: fec42783 lw a5,-20(s0)
1004c: fcf04ce3 bgtz a5,10024 <some_func+0x24>
10050: 00000013 nop
10054: 02813403 ld s0,40(sp)
10058: 03010113 addi sp,sp,48
1005c: 00008067 ret
0000000000010060 <main>:
10060: fe010113 addi sp,sp,-32
10064: 00113c23 sd ra,24(sp)
10068: 00813823 sd s0,16(sp)
1006c: 02010413 addi s0,sp,32
10070: 00500793 li a5,5
10074: fef42623 sw a5,-20(s0)
10078: 00600793 li a5,6
1007c: fef42423 sw a5,-24(s0)
10080: fe042223 sw zero,-28(s0)
10084: fe440793 addi a5,s0,-28
10088: 00078613 mv a2,a5
1008c: fe842583 lw a1,-24(s0)
10090: fec42503 lw a0,-20(s0)
10094: f6dff0ef jal 10000 <some_func>
10098: 00000013 nop
1009c: 01813083 ld ra,24(sp)
100a0: 01013403 ld s0,16(sp)
100a4: 02010113 addi sp,sp,32
100a8: 00008067 ret
The important steps that need clarification are: (some_func(int,int,int))
10060: fe010113 addi sp,sp,-32
10064: 00113c23 sd ra,24(sp)
10068: 00813823 sd s0,16(sp)
1006c: 02010413 addi s0,sp,32
and: (main())
10000: fd010113 addi sp,sp,-48
10004: 02813423 sd s0,40(sp)
10008: 03010413 addi s0,sp,48
From my understanding: The stack pointer is moved to make space for return-address and parameters. (main might be a special case here.) How are the passed arguments treated when located on the stack? How are they obtained back? In general, the methodology is clear to me, but how would I hand-code this segment in order to work.
Regarding the related topic, the stack should look somewhat like
| ??? |
| params for some_func() <???> |
| ra of some_func() |
| locals of main() <int c> |
| locals of main() <int b> |
| locals of main() <int a> |
| params for main() <None> |
But that is pretty much it. Can anybody point out, how this is arranged, and how these two listings (function call) co-related?
What you want to know is specified by the RISC-V calling conventions.
Main points:
Function arguments are usually passed in the a0 to a7 registers, not on the stack. An argument is only passed via the stack if there is no room in the a* registers left.
Some registers are caller saved while others are callee saved (cf. Table 26.1, Chapter 26 RISC-V Assembly Programmer’s Handbook, in the RISC-V base specification, 2019-06-08 ratified). That means before calling a function the caller has to save all caller saved registers to the stack if it wants to retain their content. Similarly, the called function has to save all callee saved registers to the stack if it wants to use them for its own purposes.
The first few parameters, type permitting, are passed in registers so they don't even appear on the stack. Other than that, it's unclear what you really want to know. If you do get some arguments that are on the stack, they stay there even after you adjust the stack pointer so you can still address them relative to the adjusted stack pointer or a frame pointer (here $s0 apparently).
The important steps, that need clarification are:
10060: fe010113 addi sp,sp,-32 # allocate space
10064: 00113c23 sd ra,24(sp) # save $ra
10068: 00813823 sd s0,16(sp) # save $s0
1006c: 02010413 addi s0,sp,32 # set up $s0 as frame pointer
I needed to translate the follwing C code to MIPS64:
#include <stdio.h>
int main() {
int x;
for (x=0;x<10;x++) {
}
return 0;
}
I used codebench to crosscompile this code to MIPS64. The following code was created:
.file 1 "loop.c"
.section .mdebug.abi32
.previous
.gnu_attribute 4, 1
.abicalls
.option pic0
.text
.align 2
.globl main
.set nomips16
.set nomicromips
.ent main
.type main, #function
main:
.frame $fp,24,$31 # vars= 8, regs= 1/0, args= 0, gp= 8
.mask 0x40000000,-4
.fmask 0x00000000,0
.set noreorder
.set nomacro
addiu $sp,$sp,-24
sw $fp,20($sp)
move $fp,$sp
sw $0,8($fp)
j $L2
nop
$L3:
lw $2,8($fp)
addiu $2,$2,1
sw $2,8($fp)
$L2:
lw $2,8($fp)
slt $2,$2,10
bne $2,$0,$L3
nop
move $2,$0
move $sp,$fp
lw $fp,20($sp)
addiu $sp,$sp,24
j $31
nop
.set macro
.set reorder
.end main
.size main, .-main
.ident "GCC: (Sourcery CodeBench 2012.03-81) 4.6.3"
To check if the code works as expected, I usually use the WINMIPS64 simulator. For one or other reason this simulator does not want to accept this code. It appears that every line of code is wrong. I have been looking at this issue for over a day. I hope someone can help me out with this. What is wrong with this assembly code for the mips64 architecture?
From page 7 of the WINMIPS64 documentation:
The following assembler directives are supported
.data - start of data segment
.text - start of code segment
.code - start of code segment (same as .text)
.org <n> - start address
.space <n> - leave n empty bytes
.asciiz <s> - enters zero terminated ascii string
.ascii <s> - enter ascii string
.align <n> - align to n-byte boundary
.word <n1>,<n2>.. - enters word(s) of data (64-bits)
.byte <n1>,<n2>.. - enter bytes
.word32 <n1>,<n2>.. - enters 32 bit number(s)
.word16 <n1>,<n2>.. - enters 16 bit number(s)
.double <n1>,<n2>.. - enters floating-point number(s)
Get rid of everything that's not in the above list, as it won't run in the simulator.
You'll need to move the .align to before .text
WINMIPS64 expects daddi/daddui instead of addi/addiu, again as per the documentation.
As per the documentation, move $a, $b is not a supported mnemonic. Replace them with daddui $a, $b, 0 instead.
slt needs to be slti.
Finally, the simulator expects an absolute address for j, but you've given it a register. Use jr instead.
At this point I get an infinite loop. This is because the stack pointer doesn't get initialized. The simulator only gives you 0x400 bytes of memory, so go ahead and initialize the stack to 0x400:
.text
daddui $sp,$0,0x400
Now it runs. Since you're running the code by itself, nothing will be in the return register and the final jr $31 will just bring it back to the beginning.
Here's my version:
.align 2
.text
daddui $sp,$0,0x400
main:
daddui $sp,$sp,-24
sw $fp,20($sp)
daddui $fp,$sp,0
sw $0,8($fp)
j $L2
nop
$L3:
lw $2,8($fp)
daddui $2,$2,1
sw $2,8($fp)
$L2:
lw $2,8($fp)
slti $2,$2,10
bne $2,$0,$L3
nop
daddui $2,$0,0
daddui $sp,$fp,0
lw $fp,20($sp)
daddui $sp,$sp,24
jr $31
nop
Consider getting either another compiler or another simulator, because these two clearly hate each other.
Pardon me if this question has been posed before. I looked for answers to similar questions, but I'm still puzzled with my problem. So I will shoot the question anyway.
I'm using a C library called libexif for image data. I run my application (which uses this library) both on my Linux desktop and my MIPS board.
For a particular image file when I try to fetch the created time, I was getting an error/invalid value. On debugging further I saw that for this particular image file, I was not getting the tag (EXIF_TAG_DATE_TIME) as expected.
This library has several utility functions. Most functions are structured like below
int16_t
exif_get_sshort (const unsigned char *buf, ExifByteOrder order)
{
if (!buf) return 0;
switch (order) {
case EXIF_BYTE_ORDER_MOTOROLA:
return ((buf[0] << 8) | buf[1]);
case EXIF_BYTE_ORDER_INTEL:
return ((buf[1] << 8) | buf[0]);
}
/* Won't be reached */
return (0);
}
uint16_t
exif_get_short (const unsigned char *buf, ExifByteOrder order)
{
return (exif_get_sshort (buf, order) & 0xffff);
}
When the library tries to investigate the presence of tags in raw data, it calls exif_get_short() and assigns the value returned to a variable which is of type enum (int).
In the error case, exif_get_short() which is supposed to return unsigned value (34687) returns a negative number (-30871) which messes up the whole tag extraction from the image data.
34687 is outside the range of maximum representable int16_t value. And therefore leads to an overflow. When I make this slight modification in code, everything seems to work fine
uint16_t
exif_get_short (const unsigned char *buf, ExifByteOrder order)
{
int temp = (exif_get_sshort (buf, order) & 0xffff);
return temp;
}
But since this is a pretty stable library and in use for quite some time, it led me to believe that I may be missing something here. Moreover this is the general way the code is structured for other utility functions as well. Ex: exif_get_long() calls exif_get_slong(). I would then have to change all utility functions.
What is confusing me is that when I run this piece of code on my linux desktop for the error file, I see no problems and things work fine with the original library code. Which led to me believe that perhaps UINT16_MAX and INT16_MAX macros have different values on my desktop and MIPS board. But unfortunately, thats not the case. Both print identical values on the board and desktop. If this piece of code fails, it should fail also on my desktop.
What am I missing here? Any hints would be much appreciated.
EDIT:
The code which calls exif_get_short() goes something like this:
ExifTag tag;
...
tag = exif_get_short (d + offset + 12 * i, data->priv->order);
switch (tag) {
...
...
The type ExifTag is as follows:
typedef enum {
EXIF_TAG_GPS_VERSION_ID = 0x0000,
EXIF_TAG_INTEROPERABILITY_INDEX = 0x0001,
...
...
}ExifTag ;
The cross compiler being used is mipsisa32r2el-timesys-linux-gnu-gcc
CFLAGS = -pipe -mips32r2 -mtune=74kc -mdspr2 -Werror -O3 -Wall -W -D_REENTRANT -fPIC $(DEFINES)
I'm using libexif within Qt - Qt Media hub (actually libexif comes along with Qt Media hub)
EDIT2: Some additional observations:
I'm observing something bizarre. I have put print statements in exif_get_short(). Just before return
printf("return_value %d\n %u\n",exif_get_sshort (buf, order) & 0xffff, exif_get_sshort (buf, order) & 0xffff);
return (exif_get_sshort (buf, order) & 0xffff);
I see the following o/p:
return_value 34665 34665
I then also inserted print statements in the code which calls exif_get_short()
....
tag = exif_get_short (d + offset + 12 * i, data->priv->order);
printf("TAG %d %u\n",tag,tag);
I see the following o/p:
TAG -30871 4294936425
EDIT3 : Posting assembly code for exif_get_short() and exif_get_sshort() taken on MIPS board
.file 1 "exif-utils.c"
.section .mdebug.abi32
.previous
.gnu_attribute 4, 1
.abicalls
.text
.align 2
.globl exif_get_sshort
.ent exif_get_sshort
.type exif_get_sshort, #function
exif_get_sshort:
.set nomips16
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
beq $4,$0,$L2
nop
beq $5,$0,$L3
nop
li $2,1 # 0x1
beq $5,$2,$L8
nop
$L2:
j $31
move $2,$0
$L3:
lbu $2,0($4)
lbu $3,1($4)
sll $2,$2,8
or $2,$2,$3
j $31
seh $2,$2
$L8:
lbu $2,1($4)
lbu $3,0($4)
sll $2,$2,8
or $2,$2,$3
j $31
seh $2,$2
.set macro
.set reorder
.end exif_get_sshort
.align 2
.globl exif_get_short
.ent exif_get_short
.type exif_get_short, #function
exif_get_short:
.set nomips16
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.cpload $25
.set nomacro
lw $25,%call16(exif_get_sshort)($28)
jr $25
nop
.set macro
.set reorder
.end exif_get_short
Just for completeness, the ASM code taken from my linux machine
.file "exif-utils.c"
.text
.p2align 4,,15
.globl exif_get_sshort
.type exif_get_sshort, #function
exif_get_sshort:
.LFB1:
.cfi_startproc
xorl %eax, %eax
testq %rdi, %rdi
je .L2
testl %esi, %esi
jne .L8
movzbl (%rdi), %edx
movzbl 1(%rdi), %eax
sall $8, %edx
orl %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L8:
cmpl $1, %esi
jne .L2
movzbl 1(%rdi), %edx
movzbl (%rdi), %eax
sall $8, %edx
orl %edx, %eax
.L2:
rep
ret
.cfi_endproc
.LFE1:
.size exif_get_sshort, .-exif_get_sshort
.p2align 4,,15
.globl exif_get_short
.type exif_get_short, #function
exif_get_short:
.LFB2:
.cfi_startproc
jmp exif_get_sshort#PLT
.cfi_endproc
.LFE2:
.size exif_get_short, .-exif_get_short
EDIT4: Hopefully my last update :-)
ASM code with compiler option set to -O1
exif_get_short:
.set nomips16
.frame $sp,32,$31 # vars= 0, regs= 1/0, args= 16, gp= 8
.mask 0x80000000,-4
.fmask 0x00000000,0
.set noreorder
.cpload $25
.set nomacro
addiu $sp,$sp,-32
sw $31,28($sp)
.cprestore 16
lw $25,%call16(exif_get_sshort)($28)
jalr $25
nop
lw $28,16($sp)
andi $2,$2,0xffff
lw $31,28($sp)
j $31
addiu $sp,$sp,32
.set macro
.set reorder
.end exif_get_short
One thing the MIPS assembly shows (though I'm not an expert in MIPS assembly, so there's a decent chance I'm missing something or otherwise wrong) is that the exif_get_short() function is just an alias for the exif_get_sshort() function. All that exif_get_short() does is jump to the address of the exif_get_sshort() function.
The exif_get_sshort() function sign extends the 16 bit value it's returning to the full 32-bit register used for the return. There's nothing wrong with that - it's actually probably what the MIPS ABI specifies (I'm not sure).
However, since the exif_get_short() function just jumps to the exif_get_sshort() function, it has no opportunity to clear the upper 16 bits of the register.
So when the 16 bit value 0x8769 is being returned from the buffer (whether from exif_get_sshort() or exif_get_short()), the $2 register used to return the function result contains 0xffff8769, which can have the following interpretations:
as a 32-bit signed int: -30871
as a 32-bit `unsigned int: 4294936425
as a 16-bit signed int16_t: -30871
as a 16-bit unsigned uint16_t: 34665
If the compiler is supposed to ensure that the $2 return register has a top 16-bit set to zero for a uint16_t return type, then it has a bug in the code it's emitting for exif_get_short() - instead of jumping to exif_get_sshort(), it should call exif_get_sshort() and clear the upper half of $2 before returning.
From the description of the behavior you're seeing, it looks like the code calling exif_get_short() expects that the $2 resister used for the return value will have the upper 16 bits cleared so that the entire 32-bit register can be used as-is for the 16-bit uint16_t value.
I'm not sure what the MIPS ABI specifies (but I'd guess that it specifies that the upper 16 bits of the $2 register should eb cleared by exif_get_short()), but there seems to be either a code generation bug that exif_get_short() doesn't ensure $2 is entirely correct before it returns or a bug where the caller of exif_get_short() assumes that the full 32-bits of $2 are valid when only 16 bits are.
This is broken on so many levels, I don't know where to begin. Just look at what is done here:
The unsigned chars are read from the buffer.
They are assigned to a signed int16_t in exif_get_sshort.
This is assigned to an unsigned uint16_t in exif_get_short.
This is finally assigned to an enum which is of type signed int.
I'd say it's a miracle it works at all.
First, the assignment from the chars to int16_t is done with the values, not with the representation:
return ((buf[0] << 8) | buf[1]);
Which already throws you into the pit of undefined behaviour when the result actually is negative. Plus, it only works when the signed int representation of the implementation is the same as the one used in the file format (two's complement, I guess). It will fail for one's complement and sign-magnitude. So check what's the case for the MIPS implementation.
The clean way would be the other way around: Assign two chars from the buffer to an uint16_t, which will be well defined operations, and use this to return an int16_t. You could then care for further corrections of the value for different representations, if necessary.
Additionally, here:
if (!buf) return 0;
0 is a very bad choice of a return value, because it is a valid enum constant:
EXIF_TAG_GPS_VERSION_ID = 0x0000,
If this is expected to be the default for invalidity, then the constant should be returned, not the magic number. Though this seems to be a generic function to return an int16_t, thus some other error mechanism should be used here.
For your specific question, well, follow the flow of conversions between signed and unsigned on your MIPS implementation, including the default promotions done, and examine all the intermediate values to find the point where it breaks. Your MIPS uses 32 bit ints, not 16 bit, right? Check INT_MAX and UINT_MAX.