I have fed this simple code to gcc
volatile signed char x, y, z;
void test()
{
x = 0x31;
y = x + 3;
}
Volatile has been added just to avoid gcc optimization (set to -O0 anyway).
Resulting mips code was:
x:
y:
z:
test():
addiu $sp,$sp,-8
sw $fp,4($sp)
move $fp,$sp
lui $2,%hi(x)
li $3,49 # 0x31
sb $3,%lo(x)($2)
lui $2,%hi(x)
lbu $2,%lo(x)($2)
seb $2,$2
andi $2,$2,0x00ff
addiu $2,$2,3
andi $2,$2,0x00ff
seb $3,$2
lui $2,%hi(y)
sb $3,%lo(y)($2)
nop
move $sp,$fp
lw $fp,4($sp)
addiu $sp,$sp,8
j $31
nop
For (y= x+3) gcc loads the byte as unsigned and then sign extend it and then or it with 0xff?
why not simply load it using lb (which is supposed to sign extend it)?
GCC does the same for signed half words (using 0xffff of course).
I'm not particularly adept at reading MIPS assembly, but do note that you have compiled with -O0, which is supposed to generate unoptimized code. That more or less means code that implements the exact semantics of the C abstract machine. In particular,
0x31 is a constant of type int
the assignment x = 0x31 includes an implicit conversion (in this case) of the right-hand int operand to the type of the left-hand operand (signed char)
3 is also a constant of type int
evaluating x + 3 involves performing the integer promotions on the arguments, and in particular, converting the signed char value of x to (signed) int, then performing the addition
assigning the result to y involves another implicit conversion from int to signed char
In principle, all of those conversions and promotions that are implicit in the C code need to be performed explicitly by the unoptimized assembly program, and that appears roughly to be what you are seeing.
Overall, it's not very useful to ask why the assembly output of a non-optimizing compilation is not as efficient as you think it could be. If you want more efficient code, enable optimization.
gcc -march=native -O3 gives the following code on an Intel machine:
test:
.seh_endprologue
movb $49, x(%rip)
movzbl x(%rip), %eax
addl $3, %eax
movb %al, y(%rip)
ret
.seh_endproc
.comm z, 1, 0
.comm y, 1, 0
.comm x, 1, 0
.ident "GCC: (GNU) 6.4.0"
There's nothing wrong with your code. gcc/MIPS just isn't very good at code generation, which is not too surprising since it receives a lot less care and attention than gcc/Intel. In my experience clang usually generates better MIPS code than gcc does.
Related
I wonder if it's faster for the processor to negate a number or to do a subtraction. For example:
Is
int a = -3;
more efficient than
int a = 0 - 3;
In other words, does a negation is equivalent to subtracting from 0? Or is there a special CPU instruction that negate faster that a subtraction?
I suppose that the compiler does not optimize anything.
From the C language point of view, 0 - 3 is an integer constant expression and those are always calculated at compile-time.
Formal definition from C11 6.6/6:
An integer constant expression shall have integer type and shall
only have operands that are integer constants, enumeration constants,
character constants, sizeof expressions whose results are integer
constants, _Alignof expressions, and floating constants that are the
immediate operands of casts.
Knowing that these are calculated at compile time is important when writing readable code. For example if you want to declare a char array to hold 5 characters and the null terminator, you can write char str[5+1]; rather than 6, to get self-documenting code telling the reader that you have considered null termination.
Similarly when writing macros, you can make use of integer constant expressions to perform parts of the calculation at compile time.
(This answer is about negating a runtime variable, like -x or 0-x where constant-propagation doesn't result in a compile-time constant value for x. A constant like 0-3 has no runtime cost.)
I suppose that the compiler does not optimize anything.
That's not a good assumption if you're writing in C. Both are equivalent for any non-terrible compiler because of how integers work, and it would be a missed-optimization bug if one compiled to more efficient code than the other.
If you actually want to ask about asm, then how to negate efficiently depends on the ISA.
But yes, most ISAs can negate with a single instruction, usually by subtracting from an immediate or implicit zero, or from an architectural zero register.
e.g. 32-bit ARM has an rsb (reverse-subtract) instruction that can take an immediate operand. rsb rdst, rsrc, #123 does dst = 123-src. With an immediate of zero, this is just negation.
x86 has a neg instruction: neg eax is exactly equivalent to eax = 0-eax, setting flags the same way.
3-operand architectures with a zero register (hard-wired to zero) can just do something like MIPS subu $t0, $zero, $t0 to do t0 = 0 - t0. It has no need for a special instruction because the $zero register always reads as zero. Similarly AArch64 removed RSB but has a xzr / wzr 64/32-bit zero register. (Although it also has a pseudo-instruction called neg which subtracts from the zero register).
You could see most of this by using a compiler. https://godbolt.org/z/B7N8SK
But you'd have to actually compile to machine code and disassemble because gcc/clang tend to use the neg pseudo-instruction on AArch64 and RISC-V. Still, you can see ARM32's rsb r0,r0,#0 for int negate(int x){return -x;}
Both are compile time constants, and will generate the same constant initialisation in any reasonable compiler regardless of optimisation.
For example at https://godbolt.org/z/JEMWvS the following code:
void test( void )
{
int a = -3;
}
void test2( void )
{
int a = 0-3;
}
Compiled with gcc 9.2 x86-64 -std=c99 -O0 generates:
test:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], -3
nop
pop rbp
ret
test2:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], -3
nop
pop rbp
ret
Using -Os, the code:
void test( void )
{
volatile int a = -3;
}
void test2( void )
{
volatile int a = 0-3;
}
generates:
test:
mov DWORD PTR [rsp-4], -3
ret
test2:
mov DWORD PTR [rsp-4], -3
ret
The volatile being necessary to prevent the compiler removing the unused variables.
As static data it is even simpler:
int a = -3;
int b = 0-3;
outside of a function generates no executable code, just initialised data objects (initialisation is different from assignment):
a:
.long -3
b:
.long -3
Assignment of the above statics:
a = -4 ;
b = 0-4 ;
is still a compiler evaluated constant:
mov DWORD PTR a[rip], -4
mov DWORD PTR b[rip], -4
The take-home here is:
If you are interested, try it and see (with your own compiler or Godbolt set for your compiler and/or architecture),
don't sweat the small stuff, let the compiler do its job,
constant expressions are evaluated at compile time and have no run-time impact,
writing weird code in the belief you can better the compiler is almost always pointless. Compilers work better with idiomatic code the optimiser can recognise.
It's hard to tell if you ask asking if subtraction is fast then negation in a general sense, or in this specific case of implementing negation via subtraction from zero. I'll try to answer both.
General Case
For the general case, on most modern CPUs these operations are both very fast: usually each only taking a single cycle to execute, and often having a throughput of more than one per cycle (because CPUs are superscalar). On all recent AMD and Intel CPUs that I checked, both sub and neg execute at the same speed for register and immediate arguments.
Implementing -x
As regards to your specific question of implementing the -x operation, it would usually be slightly faster to implement this with a dedicated neg operation than with a sub, because with neg you don't have to prepare the zero registers. For example, a negation function int neg(int x) { return -x; }; would look something like this with the neg instruction:
neg:
mov eax, edi
neg eax
... while implementing it terms of subtraction would look something like:
neg:
xor eax, eax
sub eax, edi
Well ... sub didn't come out looking at worse there, but that's mostly a quirk of the calling convention and the fact that x86 uses a 1 argument destructive neg: the result needs to be in eax, so in the neg case 1 instruction is spent just moving the result to the right register, and one doing the negation. The sub version takes two instructions to perform the negation itself: one to zero a register, and one to do the subtraction. It so happens that this lets you avoid the ABI shuffling because you get to choose the zero register as the result register.
Still, this ABI related inefficiency wouldn't persist after inlining, so we can say in some fundamental sense that neg is slightly more efficient.
Now many ISAs may not have a neg instruction at all, so the question is more or less moot. They may have a hardcoded zero register, so you'd implement negation via subtraction from this register and there is no cost to set up the zero.
I have an issue with the C code below into which I have included Sparc Assembly. The code is compiled and running on Debian 9.0 Sparc64. It does a simple summation and print the result of this sum which equals to nLoop.
The problem is that for an initial number of iterations greater than 1e+9, the final sum at the end is systematically equal to 1410065408 : I don't understand why since I put explicitly unsigned long long int type for sum variable and so sum can be in [0, +18,446,744,073,709,551,615] range.
For example, for nLoop = 1e+9, I expect sum to be equal to 1e+9.
Does issue come rather from included Assembly Sparc code which could not handle 64 bits variables (in input or output) ?
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[])
{
int i;
// Init sum
unsigned long long int sum = 0ULL;
// Number of iterations
unsigned long long int nLoop = 10000000000ULL;
// Loop with Sparc assembly into C source
asm volatile ("clr %%g1\n\t"
"clr %%g2\n\t"
"mov %1, %%g1\n" // %1 = input parameter
"loop:\n\t"
"add %%g2, 1, %%g2\n\t"
"subcc %%g1, 1, %%g1\n\t"
"bne loop\n\t"
"nop\n\t"
"mov %%g2, %0\n" // %0 = output parameter
: "=r" (sum) // output
: "r" (nLoop) // input
: "g1", "g2"); // clobbers
// Print results
printf("Sum = %llu\n", sum);
return 0;
}
How to fix this problem of range and allow to use 64 bits variables into Sparc Assembly code ?
PS: I tried to compile with gcc -m64, issue remains.
Update 1
As requested by #zwol, below is the output Assembly Sparc code generated with : gcc -O2 -m64 -S loop.c -o loop.s
.file "loop.c"
.section ".text"
.section .rodata.str1.8,"aMS",#progbits,1
.align 8
.LC0:
.asciz "Sum = %llu\n"
.section .text.startup,"ax",#progbits
.align 4
.global main
.type main, #function
.proc 04
main:
.register %g2, #scratch
save %sp, -176, %sp
sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
call __sparc_get_pc_thunk.l7
add %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
sethi %hi(9764864), %o1
or %o1, 761, %o1
sllx %o1, 10, %o1
#APP
! 13 "loop.c" 1
clr %g1
clr %g2
mov %o1, %g1
loop:
add %g2, 1, %g2
subcc %g1, 1, %g1
bne loop
nop
mov %g2, %o1
! 0 "" 2
#NO_APP
mov 0, %i0
sethi %gdop_hix22(.LC0), %o0
xor %o0, %gdop_lox10(.LC0), %o0
call printf, 0
ldx [%l7 + %o0], %o0, %gdop(.LC0)
return %i7+8
nop
.size main, .-main
.ident "GCC: (Debian 7.3.0-15) 7.3.0"
.section .text.__sparc_get_pc_thunk.l7,"axG",#progbits,__sparc_get_pc_thunk.l7,comdat
.align 4
.weak __sparc_get_pc_thunk.l7
.hidden __sparc_get_pc_thunk.l7
.type __sparc_get_pc_thunk.l7, #function
.proc 020
__sparc_get_pc_thunk.l7:
jmp %o7+8
add %o7, %l7, %l7
.section .note.GNU-stack,"",#progbits
UPDATE 2:
As suggested by #Martin Rosenau, I did following modifications :
loop:
add %g2, 1, %g2
subcc %g1, 1, %g1
bpne %icc, loop
bpne %xcc, loop
nop
mov %g2, %o1
But at the compilation, I get :
Error: Unknown opcode: `bpne'
What could be the reason for this compilation error ?
subcc %%g1, 1, %%g1
bne loop
Your problem is the bne instruction:
Unlike the x86-64 CPU Sparc64 CPUs don't have different instructions for 32- and 64-bit subtraction:
If you want subtract 1 from 0x12345678 the result is 0x12345677. If you subtract 1 from 0xF00D12345678 the result is 0xF00D12345677 so if you only use the lower 32 bits of a register a 64-bit subtraction has the same effect as the 32-bit subtraction.
Therefore the Sparc64 CPUs do not have different instructions for 64-bit and 32-bit addition, subtraction, multiplication, left shift etc.
These CPUs have different instructions for 32-bit and 64-bit operations when the upper 32 bits influence the lower 32 bits (e.g. right shift).
However the zero flag depends on the result of the subcc operation.
To solve this problem the Sparc64 CPUs have each of the integer flags (zero, overflow, carry, sign) twice:
The 32-bit zero flag will be set if the lower 32 bits of a register are zero; the 64-bit zero flag will be set if all 64 bits of a register are zero.
To be compatible with existing 32-bit programs the bne instruction will check the 32-bit zero flag, not the 64-bit zero flag.
is systematically equal to 1410065408
1e10 = 0x200000000 + 1410065408 so after 1410065408 steps the value 0x200000000 is reached which has the lower 32 bits set to 0 and bne will not jump any more.
However for 1e11 you should not get 1410065408 but 1215752192 as a result because 1e11 = 0x1700000000 + 1215752192.
bne
There is a new instruction named bpne which has up to 4 arguments!
In the simplest variant (with only two arguments) the instruction should (I have not used Sparc for 5 years now, so I'm not sure) work like this:
bpne %icc, loop # Like bne (based on the 32-bit result)
bpne %xcc, loop # Like bne, but based on the 64-bit result
EDIT
Error: Unknown opcode: 'bpne'
I just tried using GNU assembler:
GNU assembler names the new instruction bne - just like the old one:
bne loop # Old variant
bne %icc, loop # New variant based on the 32-bit result
bne %xcc, loop # (New variant) Based on the 64-bit result
subcc %g1, 1, %g1
bpne %icc, loop
bpne %xcc, loop
nop
The first bpne (or bne) makes no sense: Whenever the first line would do the jump the second line would also jump. And if you don't use .reorder (however this is the default) you would also need to add a nop between the two branch instructions...
The code should look like this (assuming your assembler also names bpne bne):
subcc %g1, 1, %g1
bne %xcc, loop
nop
Try "bne %xcc, loop" which should branch based on the 64 bit result.
I thought since condition is a >= 3, we should use jl (less).
But gcc used jle (less or equal).
It make no sense to me; why did the compiler do this?
You're getting mixed up by a transformation the compiler made on the way from the C source to the asm implementation. gcc's output implements your function this way:
a = 5;
if (a<=2) goto ret0;
return 1;
ret0:
return 0;
It's all clunky and redundant because you compiled with -O0, so it stores a to memory and then reloads it, so you could modify it with a debugger if you set a breakpoint and still have the code "work".
See also How to remove "noise" from GCC/clang assembly output?
Compilers generally prefer to reduce the magnitude of a comparison constant, so it's more likely to fit in a sign-extended 8-bit immediate instead of needing a 32-bit immediate in the machine code.
We can get some nice compact code by writing a function that takes an arg, so it won't optimize away when we enable optimizations.
int cmp(int a) {
return a>=128; // In C, a boolean converts to int as 0 or 1
}
gcc -O3 on Godbolt, targetting the x86-64 ABI (same as your code):
xorl %eax, %eax # whole RAX = 0
cmpl $127, %edi
setg %al # al = (edi>127) : 1 : 0
ret
So it transformed a >=128 into a >127 comparison. This saves 3 bytes of machine code, because cmp $127, %edi can use the cmp $imm8, r/m32 encoding (cmp r/m32, imm8 in Intel syntax in Intel's manual), but 128 would have to use cmp $imm32, r/m32.
BTW, comparisons and conditions make sense in Intel syntax, but are backwards in AT&T syntax. For example, cmp edi, 127 / jg is taken if edi > 127.
But in AT&T syntax, it's cmp $127, %edi, so you have to mentally reverse the operands or think of a > instead of <
The assembly code is comparing a to two, not three. That's why it uses jle. If a is less than or equal to two it logically follows that a IS NOT greater than or equal to 3, and therefore 0 should be returned.
Pardon me if this question has been posed before. I looked for answers to similar questions, but I'm still puzzled with my problem. So I will shoot the question anyway.
I'm using a C library called libexif for image data. I run my application (which uses this library) both on my Linux desktop and my MIPS board.
For a particular image file when I try to fetch the created time, I was getting an error/invalid value. On debugging further I saw that for this particular image file, I was not getting the tag (EXIF_TAG_DATE_TIME) as expected.
This library has several utility functions. Most functions are structured like below
int16_t
exif_get_sshort (const unsigned char *buf, ExifByteOrder order)
{
if (!buf) return 0;
switch (order) {
case EXIF_BYTE_ORDER_MOTOROLA:
return ((buf[0] << 8) | buf[1]);
case EXIF_BYTE_ORDER_INTEL:
return ((buf[1] << 8) | buf[0]);
}
/* Won't be reached */
return (0);
}
uint16_t
exif_get_short (const unsigned char *buf, ExifByteOrder order)
{
return (exif_get_sshort (buf, order) & 0xffff);
}
When the library tries to investigate the presence of tags in raw data, it calls exif_get_short() and assigns the value returned to a variable which is of type enum (int).
In the error case, exif_get_short() which is supposed to return unsigned value (34687) returns a negative number (-30871) which messes up the whole tag extraction from the image data.
34687 is outside the range of maximum representable int16_t value. And therefore leads to an overflow. When I make this slight modification in code, everything seems to work fine
uint16_t
exif_get_short (const unsigned char *buf, ExifByteOrder order)
{
int temp = (exif_get_sshort (buf, order) & 0xffff);
return temp;
}
But since this is a pretty stable library and in use for quite some time, it led me to believe that I may be missing something here. Moreover this is the general way the code is structured for other utility functions as well. Ex: exif_get_long() calls exif_get_slong(). I would then have to change all utility functions.
What is confusing me is that when I run this piece of code on my linux desktop for the error file, I see no problems and things work fine with the original library code. Which led to me believe that perhaps UINT16_MAX and INT16_MAX macros have different values on my desktop and MIPS board. But unfortunately, thats not the case. Both print identical values on the board and desktop. If this piece of code fails, it should fail also on my desktop.
What am I missing here? Any hints would be much appreciated.
EDIT:
The code which calls exif_get_short() goes something like this:
ExifTag tag;
...
tag = exif_get_short (d + offset + 12 * i, data->priv->order);
switch (tag) {
...
...
The type ExifTag is as follows:
typedef enum {
EXIF_TAG_GPS_VERSION_ID = 0x0000,
EXIF_TAG_INTEROPERABILITY_INDEX = 0x0001,
...
...
}ExifTag ;
The cross compiler being used is mipsisa32r2el-timesys-linux-gnu-gcc
CFLAGS = -pipe -mips32r2 -mtune=74kc -mdspr2 -Werror -O3 -Wall -W -D_REENTRANT -fPIC $(DEFINES)
I'm using libexif within Qt - Qt Media hub (actually libexif comes along with Qt Media hub)
EDIT2: Some additional observations:
I'm observing something bizarre. I have put print statements in exif_get_short(). Just before return
printf("return_value %d\n %u\n",exif_get_sshort (buf, order) & 0xffff, exif_get_sshort (buf, order) & 0xffff);
return (exif_get_sshort (buf, order) & 0xffff);
I see the following o/p:
return_value 34665 34665
I then also inserted print statements in the code which calls exif_get_short()
....
tag = exif_get_short (d + offset + 12 * i, data->priv->order);
printf("TAG %d %u\n",tag,tag);
I see the following o/p:
TAG -30871 4294936425
EDIT3 : Posting assembly code for exif_get_short() and exif_get_sshort() taken on MIPS board
.file 1 "exif-utils.c"
.section .mdebug.abi32
.previous
.gnu_attribute 4, 1
.abicalls
.text
.align 2
.globl exif_get_sshort
.ent exif_get_sshort
.type exif_get_sshort, #function
exif_get_sshort:
.set nomips16
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
beq $4,$0,$L2
nop
beq $5,$0,$L3
nop
li $2,1 # 0x1
beq $5,$2,$L8
nop
$L2:
j $31
move $2,$0
$L3:
lbu $2,0($4)
lbu $3,1($4)
sll $2,$2,8
or $2,$2,$3
j $31
seh $2,$2
$L8:
lbu $2,1($4)
lbu $3,0($4)
sll $2,$2,8
or $2,$2,$3
j $31
seh $2,$2
.set macro
.set reorder
.end exif_get_sshort
.align 2
.globl exif_get_short
.ent exif_get_short
.type exif_get_short, #function
exif_get_short:
.set nomips16
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.cpload $25
.set nomacro
lw $25,%call16(exif_get_sshort)($28)
jr $25
nop
.set macro
.set reorder
.end exif_get_short
Just for completeness, the ASM code taken from my linux machine
.file "exif-utils.c"
.text
.p2align 4,,15
.globl exif_get_sshort
.type exif_get_sshort, #function
exif_get_sshort:
.LFB1:
.cfi_startproc
xorl %eax, %eax
testq %rdi, %rdi
je .L2
testl %esi, %esi
jne .L8
movzbl (%rdi), %edx
movzbl 1(%rdi), %eax
sall $8, %edx
orl %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L8:
cmpl $1, %esi
jne .L2
movzbl 1(%rdi), %edx
movzbl (%rdi), %eax
sall $8, %edx
orl %edx, %eax
.L2:
rep
ret
.cfi_endproc
.LFE1:
.size exif_get_sshort, .-exif_get_sshort
.p2align 4,,15
.globl exif_get_short
.type exif_get_short, #function
exif_get_short:
.LFB2:
.cfi_startproc
jmp exif_get_sshort#PLT
.cfi_endproc
.LFE2:
.size exif_get_short, .-exif_get_short
EDIT4: Hopefully my last update :-)
ASM code with compiler option set to -O1
exif_get_short:
.set nomips16
.frame $sp,32,$31 # vars= 0, regs= 1/0, args= 16, gp= 8
.mask 0x80000000,-4
.fmask 0x00000000,0
.set noreorder
.cpload $25
.set nomacro
addiu $sp,$sp,-32
sw $31,28($sp)
.cprestore 16
lw $25,%call16(exif_get_sshort)($28)
jalr $25
nop
lw $28,16($sp)
andi $2,$2,0xffff
lw $31,28($sp)
j $31
addiu $sp,$sp,32
.set macro
.set reorder
.end exif_get_short
One thing the MIPS assembly shows (though I'm not an expert in MIPS assembly, so there's a decent chance I'm missing something or otherwise wrong) is that the exif_get_short() function is just an alias for the exif_get_sshort() function. All that exif_get_short() does is jump to the address of the exif_get_sshort() function.
The exif_get_sshort() function sign extends the 16 bit value it's returning to the full 32-bit register used for the return. There's nothing wrong with that - it's actually probably what the MIPS ABI specifies (I'm not sure).
However, since the exif_get_short() function just jumps to the exif_get_sshort() function, it has no opportunity to clear the upper 16 bits of the register.
So when the 16 bit value 0x8769 is being returned from the buffer (whether from exif_get_sshort() or exif_get_short()), the $2 register used to return the function result contains 0xffff8769, which can have the following interpretations:
as a 32-bit signed int: -30871
as a 32-bit `unsigned int: 4294936425
as a 16-bit signed int16_t: -30871
as a 16-bit unsigned uint16_t: 34665
If the compiler is supposed to ensure that the $2 return register has a top 16-bit set to zero for a uint16_t return type, then it has a bug in the code it's emitting for exif_get_short() - instead of jumping to exif_get_sshort(), it should call exif_get_sshort() and clear the upper half of $2 before returning.
From the description of the behavior you're seeing, it looks like the code calling exif_get_short() expects that the $2 resister used for the return value will have the upper 16 bits cleared so that the entire 32-bit register can be used as-is for the 16-bit uint16_t value.
I'm not sure what the MIPS ABI specifies (but I'd guess that it specifies that the upper 16 bits of the $2 register should eb cleared by exif_get_short()), but there seems to be either a code generation bug that exif_get_short() doesn't ensure $2 is entirely correct before it returns or a bug where the caller of exif_get_short() assumes that the full 32-bits of $2 are valid when only 16 bits are.
This is broken on so many levels, I don't know where to begin. Just look at what is done here:
The unsigned chars are read from the buffer.
They are assigned to a signed int16_t in exif_get_sshort.
This is assigned to an unsigned uint16_t in exif_get_short.
This is finally assigned to an enum which is of type signed int.
I'd say it's a miracle it works at all.
First, the assignment from the chars to int16_t is done with the values, not with the representation:
return ((buf[0] << 8) | buf[1]);
Which already throws you into the pit of undefined behaviour when the result actually is negative. Plus, it only works when the signed int representation of the implementation is the same as the one used in the file format (two's complement, I guess). It will fail for one's complement and sign-magnitude. So check what's the case for the MIPS implementation.
The clean way would be the other way around: Assign two chars from the buffer to an uint16_t, which will be well defined operations, and use this to return an int16_t. You could then care for further corrections of the value for different representations, if necessary.
Additionally, here:
if (!buf) return 0;
0 is a very bad choice of a return value, because it is a valid enum constant:
EXIF_TAG_GPS_VERSION_ID = 0x0000,
If this is expected to be the default for invalidity, then the constant should be returned, not the magic number. Though this seems to be a generic function to return an int16_t, thus some other error mechanism should be used here.
For your specific question, well, follow the flow of conversions between signed and unsigned on your MIPS implementation, including the default promotions done, and examine all the intermediate values to find the point where it breaks. Your MIPS uses 32 bit ints, not 16 bit, right? Check INT_MAX and UINT_MAX.
I just tried out a simple C program using an if statement and analyzed its assembly. However, its behavior differs a lot when -O2 flag is used for compilation.
The C code for the same is :-
#include<stdio.h>
int main(int argc, char **argv) {
int a;
if(a<0) {
printf("A is less than 0\n");
}
}
And the corresponding assembly is:-
main:
push %ebp
mov %ebp, %esp
sub %esp, 8
and %esp, -16
sub %esp, 16
test %eax, %eax
js .L4
leave
ret
.p2align 4,,15
.L4:
sub %esp, 12
push OFFSET FLAT:.LC0
call puts
add %esp, 16
leave
ret
.size main, .-main
.section .note.GNU-stack,"",#progbits
.ident "GCC: (GNU) 3.4.6"
I read that the test instruction basically just performs the logical AND of the two operands. I also read that the js instruction performs a jump when there is a change in sign in the previous instruction. So, testing eax with eax would give 0 or 1 and the jump would depend on this.
I fail to understand how it is being used here for branching.
Could someone explain how this works?
JS doesn't jump when there is a change in sign, it jumps if the sign flag is 1.
The sign bit is on if the result of the last operation was negative(negative numbers in 2's compliment have the most significant bit in 1).
So if the AND operation was between two negative integers(-1 & -1) the last bit will be 1(sign flag), so the jump is taken. In case the numbers were positive the last bit would be 0, the jump won't be taken.
Well I don't know what you expect, but your program has undefined behavior since a is not initialized. So the assembler output could be literally anything.
The Intel manuals are good for this. This is what it documents for the TEST instruction:
SF is the sign flag, the one that's tested by the JS opcode. It is set to the most significant bit of eax here, the sign bit. The jump is thus taken when eax contains a negative number.
If eax is negative, the flags will indicate that after the test instruction and the jump will be taken. That pushes the PC to .L4 which does the printing. Otherwise, we leave.
test eax, eax will set zero flag if eax = 0
js instruction will check the sign flag basically (a<0)
Looks like when you specify -O2, the compiler is putting your "int a" into a register for speed optimization.
Since you never initialize 'int a', the assembly does not show anything written to eax and it instead has the value that was last assigned to it.
The other answers explain how test is working as a branching mechanism.