add r1, r1, r0, lsl #3
I am not sure about the operation executed with this instruction.
Does it mean:
Add r0 and r1, take the result and execute a left shift of 3 bits, then save the result in r1
or
Add r1 and (r0 left shifted of 3 bits) then save the result in r1
Thanks in advance.
Most of the arithmetic and logical instructions in the Thumb and Thumb-2 instruction sets take three arguments. The first is the destination register, the second is the first operand (also a register), and the third is the flexible second operand. (If the destination register is omitted, the first operand is used as the destination.)
In almost all cases, the flexible second operand can be:
A register
A register with a shift or rotate applied, where the available shifts are
ASR #n (arithmetic shift right)
LSL #n (logical shift left)
LSR #n (logical shift right)
ROR #n (rotate right)
RRX (rotate right one bit through carry)
A constant in the format #constant, where #constant can be:
Any constant that can be produced by shifting an 8-bit value left by any number of bits within a 32-bit word
Any constant of the form 0x00XY00XY
Any constant of the form 0xXY00XY00
Any constant of the form 0xXYXYXYXY
So in your case, r0, lsl #3 is the second operand to the add instruction, and so the shift is performed before the add.
For more information see the ARM developer documentation.
Related
Just started to learn C, and i feel little bit confused.
I have some questions:
If i have the following code:
signed char x = 56;
// In the RAM, I will see 00111000 yes/no?
signed char z = -56;
// In the RAM, I will see 11001000 yes/no?
unsigned char y = 200;
// In the RAM, I will see 11001000 yes/no?
I have the following code:
if (z<0){
printf("0 is bigger then z ");
}
After compiling, how the assembly instructions know if z is -56 and not 200?(there is a special ASM instructions for signed and unsigned?).
As i mentioned in question number 1, the value of z and y is 11001000, and there is not any indicate to know if its signed or unsigned.
Apologize if i didn't find the right way to ask my question, hope you understand me
Thanks
Numbers are stored in binary. Negative numbers are usually stored as two's complement form, but C language allows different representations. So this one:
signed char z = -56;
// In the RAM, I will see 11001000 yes/no?
usually yes, but may be not on some exotic platforms.
Second question is too implementation specific. For example comparison against zero on x86 may be performed as self-comparison, and flags register would be affected, for unsigned comparison sign flag (SF) is ignored.
The compiler will generate the appropriate instructions for the signed and unsigned cases. I think it might better to see an example. The following code
void foobar();
void foo(unsigned char a)
{
if (a < 10)
foobar();
}
void bar(char a)
{
if (a < 10)
foobar();
}
Will translate to this MIPS code with GCC 5.4 using -O3 flag.
foo:
andi $4,$4,0x00ff
sltu $4,$4,10
bne $4,$0,$L4
nop
j $31
nop
$L4:
j foobar
nop
bar:
sll $4,$4,24
sra $4,$4,24
slt $4,$4,10
bne $4,$0,$L7
nop
j $31
nop
$L7:
j foobar
nop
This is the interesting part of the foo function (which use unsigned char type)
foo:
andi $4,$4,0x00ff
sltu $4,$4,10
As you can see sltu command used which is the unsinged version of slt. (You don't really have to know what it does)
While if we looking at the function bar relevants part
bar:
sll $4,$4,24
sra $4,$4,24
slt $4,$4,10
You can see that slt used which will treat its register operand as signed. The sll and sra pair doing sign extension since here the operands a was signed so its needed, while in unsigned case its not.
So you could see that different instructions generated with respect to the signdess of the operands.
The compiler will generate different instructions depending on whether it is an unsigned or signed type. And that is what tells the processor which way to treat it. So yes there are seperate instructions for signed and unsigned. With Intel processors, there are also seperate instructions depending on the width (char, short, int)
there is a special ASM instructions for signed and unsigned?
Yes, hardware generally has machine code instructions (or instruction sequences) that can
sign extend a byte to word size
zero extend a byte to word size
compare signed quantities for the various relations <, <=, >, >=
compare unsigned quantities for the various relations <, <=, >, >=
how the assembly instructions know if z is -56 and not 200?
In high level languages we associate a type with a variable. From then on the compiler knows the default way to interpret code that uses the variable. (We can override or change that default interpretation using a cast at usages of the variable.)
In machine code, there are only bytes, either in memory or in CPU registers. So, it is not how it is stored that matter (for signed vs. unsigned), but what instructions are used to access the storage. The compiler will use the right set of machine code instructions every time the variable is accessed.
While we store lots of things in memory, the processor has no concept of variable declarations. The processor only sees machine code instructions, and interprets all data types through the eyes of the instruction it is being told to execute.
As an assembly programmer, it is your job to apply the proper instructions (here signed vs. unsigned) to the same variable each time it is used. Using a byte as a signed variable and later as an unsigned variable, is a logic bug that is easy to do in assembly language.
Some assemblers will help if you use the wrong size to access a variable, but none that I know help if you use the proper size but incorrect signed-ness.
Computers do not know nor care about such things. Unsigned and signed is only relevant to the programmer. The value 0xFF can at the same time be -1, 255 or an address or a portion of an address. Part of a floating point number and so on. The computer does not care. HOW the programmer conveys their interpretation of the bits is through the program. Understanding that addition and subtraction also do not care about signed vs unsigned because it is the same logic, but other instructions like multiplies where the result is larger than the inputs or divide where the result is smaller than the inputs then there are unsigned and signed versions of the instructions or your processor may only have one and you have to synthesize the other (or none and you have to synthesize both).
int fun0 ( void )
{
return(5);
}
unsigned int fun1 ( void )
{
return(5);
}
00000000 <fun0>:
0: e3a00005 mov r0, #5
4: e12fff1e bx lr
00000008 <fun1>:
8: e3a00005 mov r0, #5
c: e12fff1e bx lr
no special bits nor nomenclature...bits is bits.
The compiler driven by the users indication of signed vs unsigned in the high level language drive instructions and data values that cause the alu to output flags that indicate greater than, less than, and equal through single flags or combinations, then a conditional branch can be taken based on the flag. Often but not always the compiler will generate the opposite if z < 0 then do something the compiler will say if z >= 0 then jump over the something.
I have just a little theoretical curiosity. The == operator in C returns 1 in case of positive equality, 0 otherwise. My knowledge of assembly is very limited. However I was wondering if it could be possible, theoretically, to implement a new operator that returns ~0 in case of positive equality, 0 otherwise – but at one condition: it must produce the same number of assembly instructions as the == operator. It's really just a theoretical curiosity, I have no practical uses in mind.
EDIT
My question targets x86 CPUs, however I am very curious to know if there are architectures that natively do that.
SECOND EDIT
As Sneftel has pointed out, nothing similar to the SETcc instructions [1] – but able to convert flag register bits into 0/~0 values (instead of the classical 0/1) – exists. So the answer to my question seems to be no.
THIRD EDIT
A little note. I am not trying to represent a logical true as ~0, I am trying to understand if a logical true can also be optionally represented as ~0 when needed, whithout further effort, within a language that already normally represents true as 1. And for this I had hypothized a new operator that “returns” numbers, not booleans (the natural logical true “returned” by == remains represented as 1) – otherwise I would have asked whether == could be re-designed to “return” ~0 instead of 1. You can think of this new operator as half-belonging to the family of bitwise operators, which “return” numbers, not booleans (and by booleans I don't mean boolean data types, I mean anything outside of the number pair 0/1, which is what a boolean is intended in C as a result of a logical operation).
I know that all of this might sound futile, but I had warned: it is a theoretical question.
However here my question seems to be addressed explicitly:
Some languages represent a logical one as an integer with all bits set. This representation can be obtained by choosing the logically opposite condition for the SETcc instruction, then decrementing the result. For example, to test for overflow, use the SETNO instruction, then decrement the result.
So it seems there is no direct instruction, since using SETNE and then decrementing means adding one more instruction.
EDIT: as other people are pointing out, there are some flavors of "conditionally assign 0/1" out there. Kind of undermines my point :) Apparently, the 0/1 boolean type admits a slightly deeper optimization than a 0/~0 boolean.
The "operator returns a value" notion is a high level one, it's not preserved down to the assembly level. That 1/0 may only exist as a bit in the flags register, or not even that.
In other words, assigning the C-defined value of the equality operator to an int sized variable is not a primitive on the assembly level. If you write x = (a == b), the compiler might implement it as
cmp a, b ; set the Z flag
cmovz x, 1 ; if equals, assign 1
cmovnz x, 0 ; if not equals, assign 0
Or it can be done with conditional jumps. As you can see, assigning a ~0 as the value for TRUE will take the same commands, just with a different operand.
None of the architectures that I'm familiar with implement equality comparison as "assign a 1 or 0 to a general purpose register".
There is no assembly implementation of a C operator. For instance, there is no x86 instruction which compares two arguments and results in a 0 or 1, only one which compares two arguments and puts the result in a bit in the flag register. And that's not usually what happens when you use ==.
Example:
void foo(int a, int b) {
if(a == b) { blah(); }
}
produces the following assembly, more or less:
foo(int, int):
cmp %edi, %esi
je .L12
rep ret
.L12:
jmp blah()
Note that nothing in there involves a 0/1 value. If you want that, you have to really ask for it:
int bar(int a, int b) {
return a == b;
}
which becomes:
bar(int, int):
xor %eax, %eax
cmp %edi, %esi
sete %al
ret
I suspect the existence of the SETcc instructions is what prompted your question, since they convert flag register bits into 0/1 values. There is no corresponding instruction which converts them into 0/~0: GCC instead does a clever little DEC to map them. But in general, the result of == exists only as an abstract and optimizer-determined difference in machine state between the two.
Incidentally, I would not be surprised at all if some x86 implementations chose to fuse SETcc and a following DEC into a single micro-op; I know this is done with other common instruction pairs. There is no simple relationship between a stream of instructions and a number of cycles.
For just 1 extra cycle you can just negate the /output/.
Internally in 8086, the comparison operations only exist in the flags. Getting the value of the flags into a variable takes extra code. It is pretty much the same code whether you want true as 1 or -1. Generally a compiler doesn't actually generate the value 0 or 1 when evaluating an if statement, but uses the Jcc instructions directly on the flags generated by comparison operations. https://pdos.csail.mit.edu/6.828/2006/readings/i386/Jcc.htm
With 80386, SETcc was added, which only ever sets 0 or 1 as the answer, so that is the preferred arrangement if the code insists on storing the answer. https://pdos.csail.mit.edu/6.828/2006/readings/i386/SETcc.htm
And there are lots of new compare instructions that save results to registers going forward. The flags have been seen as a bottleneck for instruction pipeline stalls in modern processors, and very much are disfavoured by code optimisation.
Of course there are all sorts of tricks you can do to get 0, 1, or -1 given a particular set of values to compare. Needless to say the compiler has been optimised to generate 1 for true when applying these tricks, and wherever possible, it doesn't actually store the value at all, but just reorganises your code to avoid it.
SIMD vector comparisons do produce vectors of 0 / -1 results. This is the case on x86 MMX/SSE/AVX, ARM NEON, PowerPC Altivec, etc. (They're 2's complement machines, so I like to write -1 instead of ~0 to represent the elements of all-zero / all-one bits).
e.g. pcmpeqd xmm0, xmm1 replaces each element of xmm0 with xmm0[i] == xmm1[i] ? -1 : 0;
This lets you use them as AND masks, because SIMD code can't branch separately on each vector element without unpacking to scalar and back. It has to be branchless. How to use if condition in intrinsics
e.g. to blend 2 vectors based on a condition, without SSE4.1 pblendvb / blendvps, you'd compare and then AND / ANDNOT / OR. e.g. from Substitute a byte with another one
__m128i mask = _mm_cmpeq_epi8(inp, val); // movdqa xmm1, xmm0 / PCMPEQB xmm1, xmm2
// zero elements in the original where there was a match (that we want to replace)
inp = _mm_andnot_si128(mask, inp); // inp &= ~mask; // PANDN xmm0, xmm1
// zero elements where we keep the original
__m128i tmp = _mm_and_si128(newvals, mask); // newvals & mask; // PAND xmm3, xmm1
inp = _mm_or_si128(inp, tmp); // POR xmm0, xmm1
But if you want to count matches, you can subtract the compare result. total -= -1 avoids having to negate the vector elements. How to count character occurrences using SIMD
Or to conditionally add something, instead of actually blending, just do total += (x & mask), because 0 is the identity element for operations like ADD (and some others like XOR and OR).
See How to access a char array and change lower case letters to upper case, and vice versa and Convert a String In C++ To Upper Case for examples in C with intrinsics and x86 asm.
All of this has nothing to do with C operators and implicit conversion from boolean to integer.
In C and C++, operators return a boolean true/false condition, which in asm for most machines for scalar code (not auto-vectorized) maps to a bit in a flag register.
Converting that to an integer in a register is a totally separate thing.
But fun fact: MIPS doesn't have a flags register: it has some compare-and-branch instructions for simple conditions like reg == reg or reg != reg (beq and bne). And branch on less-than-zero (branch on the sign bit of one register): bltz $reg, target.
(And an architectural $zero register that always reads as zero, so you can use that implement branch if reg !=0 or reg == 0).
For more complex conditions, you use slt (set on less-than) or sltu (set on less-than-unsigned) to compare into an integer register. Like slt $t4, $t1, $t0 implements t4 = t1 < t0, producing a 0 or 1. Then you can branch on that being 0 or not, or combine multiple conditions with boolean AND / OR before branching on that. If one of your inputs is an actual bool that's already 0 or 1, it can be optimized into this without an slt.
Incomplete instruction listing of classic MIPS instructions (not including pseudo-instructions like blt that assemble to slt into $at + bne: http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html
But MIPS32r6 / MIPS64r6 changed this: instructions generating truth values now generate all zeroes or all ones instead of just clearing/setting the 0-bit, according to https://en.wikipedia.org/wiki/MIPS_architecture#MIPS32/MIPS64_Release_6. MIPS32/64 r6 is not binary compatible with previous MIPS ISAs, it also rearranged some opcodes. And because of this change, not even asm source compatible! But it's a definite change for the better.
Fun fact, there is an undocumented 8086 SALC instruction (set AL from carry) that's still supported in 16/32-bit mode by modern Intel (and AMD?) CPUs.
It's basically like sbb al,al without setting flags: AL = CF ? -1 : 0. http://os2museum.com/wp/undocumented-8086-opcodes.
Subtract-with-borrow with the same input twice does x-x - CF on x86, where CF is a borrow for subtraction. And x-x is of course always zero. (On some other ISAs, like ARM, the carry flag meaning is opposite for subtraction, C set means "no borrow".)
In general, you can do sbb edx,edx (or any register you want) to convert CF into a 0 / -1 integer. But this only works for CF; the carry flag is special and there's nothing equivalent for other flags.
Some AMD CPUs even recognize sbb same,same as independent of the old value of the register, only dependent on CF, like xor-zeroing. On other CPUs it still has the same architectural effect, but with a microarchitectural false dependency on the old value of EDX.
So I have some code
#define umul_ppmm(w1, w0, u, v) \
asm ("mulq %3" \
: "=a" (w0), "=d" (w1) \
: "0" ((uint64_t)(u)), "rm" ((uint64_t)(v)))
I'm trying to debug it and understand how it works.
Currently, I am looking at This pdf for reference on mulq.
My understanding so far is that it is multiplying two 64 bit numbers together which would be w0 and u. Then it stores the result of that multiplication in w0 and w1.
My main questions are:
According to This GCC assembly guide on Simple Constraints 'a' and 'd' in "=a" and "=d" are address and data registers respectively. How does that play in here and what exactly does that mean?
What does "0" mean in this case? That same guide says that "An operand that matches the specified operand number is allowed." What would be the matching operand here?
How does v come into play? If at all?
Printing out the variables before and after the function call results in
w1 w0 u v
2097147 549755813889 17179869183 4611684961865433149
4294966311 17179869183 13835060159816138691 4611684961865433149
The mulq instruction implicitly produces result in the a and d registers (normally known as rax and rdx)
Operands are indexed from zero. "0" thus means same place as the first operand, which is w0. mulq implicitly uses rax as one of the input operands, hence the matching constraint. Could have written it out as "a" again.
v is the operand %3 which is the only explicit operand referenced in the mulq instruction. The code multiplies u and v so of course it "comes into play".
You printed the registers wrong, on the second line you swapped w0 and u, as u and v are unchanged input operands.
u*v=w1*2^64+w0, that is 17179869183*4611684961865433149=4294966311*2^64+13835060159816138691
I had take up a computer organization course a year ago and now I have a follow up to it as 'Computer architecture' , I am using the 3rd edition of John Hennessy's book 'Quantitative approach to computer architecture', I went through the MIPS ISA but still need some help , can you explain this line of code in greater detail
Source code:
for(i=1000; i>0; i--)
x[i] = x[i] + s;
Assembly code:
Loop: L.D F0, 0(R1) ; F0 = array element
ADD.D F4, F0, F2 ; add scalar
S.D F4, 0(R1) ; store result
DADDUI R1, R1,# -8 ; decrement address pointer
BNE R1, R2, Loop ; branch if R1 != R2
This is given as an example for loop unrolling to exploit ILP , I have a few doubts . I do get it that the array starts at Mem[0+R1] and goes backwards till Mem[R+8](as given in the text) , any reason for this or they just randomly took this location?
Also why use a DADDUI (unsigned ) when we are adding a signed number (-8) ?
Please give a detailed overview of this so that i can follow along the rest of the topics.
Thanks
The memory accesses are performed to the addesses and in the order as specified by the loop in the source code.
The daddiu instruction is sufficient to perform such address arithmetic. The "negative" value accomplishes subtraction in two's-complement. Addresses are neither negative nor positive; they are just bit-patterns. Refer to an ISA reference to learn more about MIPS and instructions.
The 16-bit signed immediate is added to the 64-bit value in GPR rs and
the 64-bit arithmetic result is placed into GPR rt . No Integer
Overflow exception occurs under any circumstances.
…
The term “unsigned” in the instruction name is a misnomer; this operation is 64-bit modulo arithmetic that does not
trap on overflow. It is appropriate for unsigned arithmetic such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
The example is not optimized or unrolled. It's just a literal translation of the source.
I know there are some other questions similar to this, but I'm still having trouble understanding the () part of it. Could someone spell this syntax out for me? Thanks.
cmp %eax,0x80498d4(,%ebx,4)
cmp is the comparison assembly instruction. It performs a comparison between two arguments by signed subtracting the right argument from the left and sets a CPU EFLAGS register. This EFLAGS register can then be used to do conditional branching / moving, etc.
First argument: `%eax (the value in the %eax register)
Second argument: 0x80498d4(,%ebx,4). This is read as offset ( base, index, scale ) In your example, the value of the second argument is the memory location offset 0x80498d4 + base (which I believe defaults to zero if not included) + value in %ebx register * 4 (scaling factor).
Note: I believe base here is empty and defaults to the value 0.
You can take a look at http://docs.oracle.com/cd/E19120-01/open.solaris/817-5477/ennby/index.html for more information on the syntax for Intel x86 assembly instructions.