Can someone explain ARM bitwise operations to me? - arm

Can someone explain ARM bit-shifts to me like I'm five? I have a very poor understanding of anything that involves non-decimal number systems so understanding the concepts of bit shifts and bitwise operators is difficult for me.
What would each of the following cases do and why (what would end up in R3 and what happens on behind the scenes on the bit level)?
/** LSL **/
mov r0, #1
mov r3, r0, LSL#10
/** LSR **/
mov r0, #1
mov r3, r0, LSR#10
/** ORR **/
mov r0, #1
mov r1, #4
orr r3, r1, r0
/** AND **/
mov r0, #1
mov r1, #4
and r3, r1, r0
/** BIC **/
mov r0, #1
mov r1, #4
bic r3, r1, r0
PS. Do not explain it in terms of C bitwise operators. I don't know what they do either (the >>, <<, |, & ones).

Truth tables, two inputs, the two numbers on the left and one output, the number on the right:
OR
a b c
0 0 0
0 1 1
1 0 1
1 1 1
the left two inputs a and b represent the four possible combinations of inputs, no more no less that is the list.
Consider a 1 to mean true and 0 to mean false. And the word OR in this case means if a OR b is true then c is true. And as you see in the table, horizontally if either a or b is true then c is true.
AND
a b c
0 0 0
0 1 0
1 0 0
1 1 1
And means they both have to be true if a AND b are both true then c is true. There is only one case where that exists above.
Now take two bytes 0x12 and 0x34 which in decimal are 18 and 52 but we dont really care much about decimal. we care about binary 0x12 is 0b00010010 and 0x34 is 0b00110100. The bitwise operators like AND and OR and XOR in assembly language mean you take one bit from each operand and that gives the result in the same bit location. Its not like add where you have things like this plus that equals blah carry the one.
so we line up the bits
0b00010010 0x12
0b00110100 0x34
So tilt your head sidways like you are going to take a bite out of a taco held in your left hand and visualize the truth table above. If we look at the two bits on the right they are 0 and 0, the next two bits are 1 and 0 and so on. So if we wanted to do an OR operation, the rule is if either a or b is true then c, the result, is true
0b00010010
0b00110100
OR ==========
0b00110110
Head tilted to the right, least significant bit (the bit in the ones column in the number) 0 or 0 = 0, neither one is set. next column (the twos column) 1 or 0 = 1 at least one is true. and so on so
0x12 OR 0x34 = 0x36
In arm assembly that would be
mov r0,#0x12
mov r1,#0x34
orr r2,r0,r1
after the or operation r2 would hold the value 0x36.
Now lets and those numbers
0b00010010
0b00110100
AND ==========
0b00010000
Remembering our truth table and the rule both a and b have to be true (a 1) we tilt our head to the right, 0 and 0 is 0, both are not true. and by inspection only one column has both inputs with a 1, the 16s column. this leaves us with 0x12 AND 0x34 = 0x10
In arm assembly that would be
mov r0,#0x12
mov r1,#0x34
and r2,r0,r1
Now we get to the BIC instruction. Which stands for bitwise clear, which hopefully will make sense in a bit. Bic on the arm is a anded with not b. Not is another truth table, but only one input and one output
NOT
a c
0 1
1 0
With only one input we have only two choices 0 and 1, 1 is true 0 is false. NOT means if not a then c is true. when a is not true c is true, when a is true c is not true. Basically it inverts.
What the bic does is have two inputs a and b, the operation is c = a AND (NOT b) so the truth table for that would be:
a AND (NOT b)
a b c
0 1 0
0 0 0
1 1 0
1 0 1
I started with the AND truth table then then NOTted the b bits, where b was a 0 in the AND truth table I made it a 1 where b was a 1 in the AND truth table I made it a 0.
So the bic operation on 0x12 and 0x34 is
0b00010010
0b00110100
BIC ==========
0b00000010
Why is it called bit clear? Understanding that makes it much easier to use. If you look at the truth table and think of the first and second inputs. Where the second, b, input is a 1 the output is 0. where the second input, b, is a 0, the output is a itself unmodified. So what that truth table or operation is doing is saying anywhere b is set clear or zero those bits in A. So if I have the number 0x1234 and I want to zero the lower 8 bits, I would BIC that with 0x00FF. And your next question is why not AND that with 0xFF00? (analyze the AND truth table and see that wherever b is a 1 you keep the a value as is, and wherever b is a 0 you zero the output). The ARM uses 32 bit registers, and a fixed 32 bit instruction set, at least traditionally. The immediate instructions
mov r0,#0x12
In arm are limited to 8 non-zero bits shifted anywhere within the number, will get to shifting in a bit. So if I had the value 0x12345678 and wanted to zero out the lower 8 bits I could do this
; assume r0 already has 0x12345678
bic r0,r0,#0xFF
or
; assume r0 already has 0x12345678
mov r1,#0xFF000000
orr r1,r1,#0x00FF0000
orr r1,r1,#0x0000FF00
;r1 now contains the value 0xFFFFFF00
and r0,r0,r1
or
; assume r0 already contains 0x12345678
ldr r1,my_byte_mask
and r0,r0,r1
my_byte_mask: .word 0xFFFFFF00
which is not horrible, compared to using a move and two orrs, but still burns more clock cycles than the bic solution because you burn the extra memory cycle reading my_byte_mask from ram, which can take a while.
or
; assume r0 already contains 0x12345678
mvn r1,#0xFF
and r0,r0,r1
This last one being not a bad compromize. note that mvn in the arm documentation is bitwise not immediate, that means rx = NOT(immediate). The immediate here is 0xFF. NOT(0xFF) means invert all the bits, it is a 32 bit register we are going to so that means 0xFFFFFF00 is the result of NOT(0xFF) and that is what the register r1 gets, before doing the and.
So that is why bic has a place in the ARM instruction set, because sometimes it takes fewer instructions or clock cycles to mask (mask = AND used to make some bits zeros) using the bic instruction instead of the and instruction.
I used the word mask as a concept to make bits in a number zero leaving the others alone. orring can be thought of as making bits in a number one while leaving the others alone, if you look at the OR truth table any time b is a 1 then c is a 1. So 0x12345678 OR 0x000000FF results in 0x123456FF the bits in the second operand are set. Yes it is also true that anytime a is set in the OR truth table then the output is set, but a lot of the time when you use these bitwise operations you have one operand you want to do something to, set a certain number of bits to one without modifying the rest or set a certain number of bits to zero without modifying the rest or you want to zero all the bits except for a certain number of bits. When used that way you have one operand coming in which is what you want to operate on and you create the second operand based on what you want the overall effect to be, for example in C if we wanted to keep only the lower byte we could have a one parameter in, one parameter out function:
unsigned int keep_lower_byte ( unsigned int a )
{
return(a&(~0xFF));
}
~ means NOT so ~0xFF, for 32 bit numbers means 0xFFFFFF00 then & means AND, so we return a & 0xFFFFFF00. a was the only real operand coming in and we invented the second one based on the operation we wanted to do...Most bitwise operations you can swap the operands in the instruction and everything turns out okay, instructions like ARM's bic though the operands are in a certain order, just like a subtract you have to use the correct order of operands.
Shifting...there are two kinds, logical, and arithmetic. logical is easiest and is what you get when you use >> or << in C.
Start with 0x12 which is 0b00010010. Shifting that three locations to the left (0x12<<3) means
00010010 < our original number 0x12
0010010x < shift left one bit location
010010xx < shift left another bit location
10010xxx < shift left a third bit location
What bits get "shifted in" to the empty locations, the x'es above, varies based on the operation. For C programming it is always zeros:
00010010 < our original number 0x12
00100100 < shift left one bit location
01001000 < shift left another bit location
10010000 < shift left a third bit location
But sometimes (usually every instruction set supports a rotate as well as a shift) there are other ways to shift and the differences have to do with what bit you shift into the empty spot, and also sometimes the bit you shifted off the end doesnt always just disappear sometimes you save that in a special bit holder location.
Some instruction sets only have a single bit shift meaning for each instruction you program you can only shift one bit, so the above would be 3 instructions, one bit at a time. Other instruction sets, like arm, allow you to have a single instruction and you specify in the instruction how many bits you want to shift in that direction. so a shift left of three
mov r0,#0x12
mov r3,r0,lsl#3 ; shift the contents of r0 3 bits to the left and store in r3
This varying of what you shift in is demonstrated between lsr and asr, logical shift right and arithmetic shift right (you will see that there is no asl, arithmetic shift left because that makes no sense, some assemblers will allow you to use an asl instruction but encode it as a lsl).
A LOGICAL shift right:
00010010 - our original number 0x12
x0001001 - shifted right one bit
xx000100 - shifted right another bit
xxx00010 - shifted right another bit
As with C there is a version that shifts in zeros, that is the logical shift right, shifting in zeros
00010010 - our original number 0x12
00001001 - shifted right one bit
00000100 - shifted right another bit
00000010 - shifted right another bit
ARITHMETIC shift right means preserve the "sign bit" what is the sign bit? that gets into twos complement numbers which you also need to learn if you have not. Basically if you consider the bit pattern/value to be a twos complement number then the most significant bit, the one on the left, is the sign bit. if it is 0 the number is positive and 1 the number is negative. You may have noticed that a shift left by one bit is the same as multiplying by 2 and a shift right is the same as dividing by 2. 0x12 >> 1 = 0x9, 18 >> 1 = 9 but what if we were to shift a minus 2 to the right one, minus two is 0xFE using bytes or 0b11111110. using the C style logical shift right 0xFE >> 1 = 0x7F, or in decimal -2 >> 1 = 0x127. We cannot solve that in C in a single operation, unfortunately, but in assembly we can using an arithmetic shift, assuming your instruction set has one, which the arm does
ARITHMETIC shift right
s1100100 - our starting value s is the sign bit whatever that is 0 or 1
ss110010 - one shift right
sss11001 - another shift right
ssss1100 - another shift right
So if the sign bit s was a 0 when we started, if the number was 01100100 then
01100100 - our starting value
00110010 - one shift right
00011001 - another shift right
00001100 - another shift right
but if that sign bit had been a one
11100100 - our starting value
11110010 - one shift right
11111001 - another shift right
11111100 - another shift right
And we can solve the 0xFE shifted right one:
11111110 - 0xFE a minus 2 in twos complement for a byte
11111111 - shifted right one
so in pseudo code 0xFE ASR 1 = 0xFF, -2 ASR 1 = -1. -2 divided by 2 = -1
The last thing you need to read up on your own has to do with rotates and/or what happens to the bit that is shifted off the end. a shift right the lsbit is shifted "off the end" of the number like blocks being slid of a table and the one that falls off might just go into the "bit bucket" (ether, heaven or hell, one of these places where bits go to die when they disappear from this world). But some instructions in some instruction sets will take that bit being shifted off and put it in the Carry flag (read up on add and subtract), not because it is a carry necessarily but because there are status bits in the alu and the Carry bit is one that kinda makes sense. Now what a rotate is, is lets say you had an 8 bit processor and you rotated one bit, the bit falling off the end lands in the Carry bit, AND the bit shifting in the other side is what was in the carry bit before the operation. Basically it is musical chairs, the bits are walking around the chairs with one person left standing, the person standing is the carry bit, the people in chairs are the bits in the register. Why is this useful at all? lets say we had an 8 bit processor like the Atmel AVR for example but wanted to do a 64 bit shift. 64 bits takes 8, 8 bit, registers, say I have my 64 bit number in those 8 registers and I want to do a 64 bit shift left one bit. I would start with the least significant byte and do an lsl which shifts a zero in but the bit shifting out goes into the carry bit. then the next most significant byte I do a rol, rotate left one bit, the bit coming in is the bit going out of the prior byte and the bit going out goes to the carry bit. I repeat the rol instruction for the other bytes, looking at a 16 bit shift:
00100010 z0001000 - our original number
00100010 z 0001000 - lsl the least significant byte, the ms bit z is in carry
0100010z 00010000 - rotate left the most significant byte pulling the z bit from carry
00100010z0001000 - if it had been a 16 bit register
0100010z00010000 - a logical shift left on a 16 bit with a zero coming in on the left
that is what the rotates are for and that is why the assembly manual bothers to tell you what flags are modified when you perform a logical operation.

I'll do the first one and then maybe you can try and work out the rest using a similar approach:
/** LSL **/
mov r0, #1 ; r0 = 0000 0000 0000 0000 0000 0000 0000 0001
mov r3, r0, LSL#10 ; r3 = r0 logically shifted left by 10 bit positions
= 0000 0000 0000 0000 0000 0100 0000 0000
^ ^
+<<<<<<<<<<<+
shift left 10 bits
Note however that if you don't yet understand boolean operations such as OR (|), AND (&), etc, then you will have a hard time understanding the corresponding ARM instructions (ORR, AND, etc).

Related

CarryFrom operation on arm processors

The Arm Architecture manual says for the ADC instruction to set the C (carry) flag in the CPSR if the S-Flag is set and a carry "occured". From the book (page 155):
C Flag = CarrryFrom(Rn + shifer_operand + C Flag)
And according to the glossar the CarryFrom is defined as follows:
CarryFrom
Returns 1 if the addition specified as its parameter caused a carry (true result is bigger
than 2^(32)−1, where the operands are treated as unsigned integers), and returns 0 in all other cases.
This delivers further information about an addition which occurred earlier in the pseudo-code. The addition is not repeated.
Now I'm wondering if the CarryForm operation is the same as an overflow check. Can anyone explain me, how I can "emulate" the CarryFrom operation or how it works?
Simply binary addition, x is the carry in to the operation and y is the carry out. For a normal add carry in is a 0 and for a normal subtract carry in is a 1. (adders are used to do subtraction, one of the features of twos complement)
y x
1111
+ 0001
======
11110
1111
+ 0001
======
0000
So the result is 0000, the carry out is a 1. Some architectures (of all of them x86, arm, mips, pdp11, 6502, ...)(yep, I know about mips in this context) invert the carry out for a subtract and leave it not inverted for addition. In this case you are asking about ADC, so that is addition so it should not be modified by any architecture.
And 4 bits or 40 does not matter it all works the same.
So if you want to add 0x0F and 0x01 but you only have a 4 bit adder (again think 64 bits and 32 instead of 8 and 4, it all works the same).
We start with normal addition of the lower bits
11110
1111
+ 0001
======
0000
Then we do an add with carry and use the carry out of the prior addition as the carry in to the second (or next one since you can do this for as much code/memory space as you have)
1
0000
+ 0000
======
00001
0000
+ 0000
======
0001
And the end result is 0x10. 0x0F + 0x01 = 0x10
The first add here happens to have an unsigned overflow as indicated by a non-zero carry out/carry flag. If you focus only on that. If the adc also had an unsigned overflow then the whole result is bad as it won't fit in the number of bits. (if the programmer considered these to be unsigned values, if signed then you look at the V bit for overflow but the Carry still cascades from the first ADD to the ADC and then from each ADC to the next until you have covered the width of the higher level operation).

distinguishes between signed and unsigned in machine code

I was reading a text book saying:
It is important to note how machine code distinguishes between signed
and unsigned values. Unlike in C, it does not associate a data type
with each program value. Instead, it mostly uses the same
(assembly)instructions for the two cases, because many arithmetic
operations have the same bit-level behavior for unsigned and
two’s-complement arithmetic.
I don't understand what it means, could anyone provide me an example?
For example, this code:
int main() {
int i = -1;
if(i < 9)
i++;
unsigned u = -1; // Wraps around to UINT_MAX value
if(u < 9)
u++;
}
gives following output on x86 GCC:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], -1 ; i = -1
cmp DWORD PTR [rbp-4], 8 ; i comparison
jg .L2 ; i comparison
add DWORD PTR [rbp-4], 1 ; i addition
.L2:
mov DWORD PTR [rbp-8], -1 ; u = -1
cmp DWORD PTR [rbp-8], 8 ; u comparison
ja .L3 ; u comparison
add DWORD PTR [rbp-8], 1 ; u addition
.L3:
mov eax, 0
pop rbp
ret
Notice how it uses the same instructions on intialization (mov) and increment (add) for variables i and u. This is because the bit pattern changes identically for unsigned and 2's complement.
Comparison also uses the same instruction cmp, but jump decision has to be different, because values where the highest bit is set are different on the types: jg (jump if greater) on signed, and ja (jump if above) on unsigned.
What instructions are chosen, depends on the architecture and the compiler.
On Intel Processors (x86 family) and others that have FLAGS, you get bits in those FLAGS that tell you how the last operation worked. The name of the FLAGS vary a little between processors, but in general you have two important ones in regard to arithmetic: CF and OF.
CF is the Carry bit (often called C on other processors).
OF is the Overflow bit (often called V on other processors).
More or less, CF represents an unsigned overflow and OF represents a signed overflow. When the processors does the ADD operation, it has one extra bit, which is CF. So, if you add two 64 bit numbers, the result without wrapping may need 65 bits. That is the carry. The OF flag is set to the highest bit (so bit 63 in a 64 bit number), using 3 logical operations against that bit in the two sources and the destination.
There is an example of how CF works with 4 bit registers:
R1 = 1010
R2 = 1101
R3 = R1 + R2 = 1 0111
^
+---- carry (CF)
The extra 1 doesn't fit in R3 so it gets put in the CF bit instead. As a side note, the MIPS processor does not have any FLAGS. It's up to you to determine whether a carry is generated (which you can do using XOR and such on the two sources and the destination).
However, in C (and C++), there is no verification of overflow on your integer types (at least not by default.) So in other words, the CF and OF flags are ignored for all your operations except the four compare operators (<, <=, >, >=).
As shown in the example presented by #user694733, the difference is whether a jg or ja will be used. Each of the 16 jump instructions will test various flags to know whether to jump or not. That combination is really what makes the difference.
Another interesting aspect is the difference between ADC and ADD. In one case you add with the carry and the other you don't. It's probably not used as much now that we have 64 bit computers, but to add two 64 bit numbers with a 32 bit processor, it would add the lower 32 bits as unsigned 32 bit numbers and then add the upper 32 bit numbers (signed or unsigned as may be the case) plus the carry from the first operation.
Say you have two 64 bit numbers in 32 bit registers (ECX:EAX and EDX:EBX), you would add them like this:
ADD EAX, EBX
ADC ECX, EDX
Here the EDX and the carry are added to ECX if EAX + EBX had an unsigned overflow (carry--meaning that adding EAX and EBX properly should be represented by 33 bits now because the result doesn't fit 32 bits, the CF flag is that 33rd bit).
To be noted, the Intel processors have:
A Zero bit: ZF (whether the result is zero or not,)
CF is called "Borrow" when subtracting (for SBC, SBB,) and
Also the AF bit which is used for "decimal number operations" (which no one in their right mind uses.) That AF bit tells you that there is an overflow in the decimal operation. Something like that. I never used that one. I find their use too complicated/cumbersome. Also, the bit is sill there in amd64 but the instructions setting it were removed (see DAA for example).
The beauty of twos complement is that for addition (and as a result subtraction since that uses an adder, again part of the beauty of twos complement). That the add operation itself does not care about signed vs unsigned the same bit patterns added together produce the same result 0xFE + 0x01 = 0xFF, -2 + 1 = 1 also 126 + 1 = 127. same input bits same result pattern.
Twos complement helps for only a percentage. Not all. add/subtract but not necessarily multiply and divide. Bitwise of course bits is bits. But (right) shifts desire a difference, but does C deliver?
The comparisons are very sensitive. The equal and not equal, zero and not zero those are single flag tests and will work. But unsigned less than and signed less than are not the same set of flags that are used/tested. The less than and greater than with or without equal applied to them do not work the same way with unsigned vs signed. Likewise signed overflow and unsigned overflow (often just called the carry bit) are computed differently from each other. And some instruction sets the carry bit is inverted when the operand is a subtract, but not always, so for comparisons you need to know whether or not it is a borrow bit on subtract or always just the carry out unmodified.
Multiplication and likely division are "it depends". An N bit times N bit equals N bit result signed and unsigned both work, but N bit times N bit equals 2*Nbit (the only really useful hardware multiply) requires a signed and unsigned version to have the hardware/instruction do all the work, otherwise you have to break the operands up into parts if you don't have both flavors. A simple paper and pencil grade school will show why, leave that to the reader to figure out.
You don't need us at all you can easily provide your own example and see from the compiler output when there is a difference and when there isn't.
int32_t fun0 ( int32_t a, int32_t b ) { return a+b; }
int32_t fun1 ( int32_t a, int32_t b ) { return a*b; }
int32_t fun2 ( int32_t a, int32_t b ) { return a^b; }
uint32_t fun3 ( uint32_t a, uint32_t b ) { return a+b; }
uint32_t fun4 ( uint32_t a, uint32_t b ) { return a*b; }
uint32_t fun5 ( uint32_t a, uint32_t b ) { return a^b; }
uint32_t fun6 ( uint64_t a, uint64_t b ) { return a+b; }
uint32_t fun7 ( uint64_t a, uint64_t b ) { return a*b; }
uint32_t fun8 ( uint64_t a, uint64_t b ) { return a^b; }
uint64_t fun9 ( uint64_t a, uint64_t b ) { return a*b; }
int64_t fun10 ( int64_t a, int64_t b ) { return a*b; }
uint64_t fun11 ( uint32_t a, uint32_t b ) { return a*b; }
int64_t fun12 ( int32_t a, int32_t b ) { return a*b; }
int32_t comp0 ( int32_t a, int32_t b ) { return a<b; }
uint32_t comp1 ( uint32_t a, uint32_t b ) { return a<b; }
plus other operators and combinations.
EDIT
Okay the real answer...rather than making you do the work.
I want to add -2 and +1
11111110
+ 00000001
============
finish it
00000000
11111110
+ 00000001
============
11111111
-2 + 1 = -1
What about 127 + 1
00000000
11111110
+ 00000001
============
11111111
hmmm...same bits in same bits out, but how I interpret those bits as a programmer varies widely.
You can try as many legal values as you want (ones that don't overflow the result) and you will see that the addition result does not know nor care about signed vs unsigned operants. Part of the beauty of twos complement.
Subtraction is just addition in logic, some may have learned "invert and add one" want to know what the bit pattern 11111111 is you invert 00000000 and add 1 00000001 so 11111111 is -1. But how does addition really work with two operands as shown above you really need a three bit adder three bits in and two bits out the result and carry out, so there is a carry in, two operand bits a result and carry out. What if we go back to grade school as well...
-32 - 3 = (-32) + (-3) apply the invert and add one to the -3 and we get (-32) + (~3) + 1
1
11100000
+ 11111100
==============
and thats how a computer does that math, inverts the carry in and the second operand. SOME invert the carry out because a 1 on carry out when the adder is used as a subtractor means no borrow, but a 0 means a borrow happened. so some instruction sets will invert the carry out some will not. this is hugely important for this topic.
Likewise the carry out bit is computed based on the addition of the msbits of the operands and the carry in to that position, it is the carry out of that addition.
abcxxxxxx
dxxxxxxx
+ exxxxxxx
============
f
a the carry out is the carry out when adding bits b+d+e. This is also known as the unsigned overflow flag when this is an addition operation and the operands are considered to be unsigned values. But the signed overflow flag is determined by whether b and a are equal or not equal.
In what situations does this happen.
bde af
000 00
001 01
010 01
011 10 <--
100 01 <--
101 10
110 10
111 11
so you can read that is carry in is not equal to carry out for the msbit there is a signed overflow. At the same time you can say if the msbit of the operands are equal and the msbit of the result is not equal to those operand bits then signed overflow is true. If you generate a table of signed numbers and their results and which overflow this will start to be clear, you don't have to do 8 bit by 8 bit 256 * 256 combinations, take 3 or 4 bit numbers synthesize your own addition routines that or 3 or 4 bits and that smaller number of combinations will be enough.
So while addition and subtraction themselves as far as the result bits go do not know signed from unsigned the flags if you have a processor that uses them the C or carry flag the V or overflow flag have a signed based use case. The carry flag itself can have of two definitions when produced by a subtract depending on the instruction set and since comparisons are generally done with a subtraction that carry definition matters to how the flags are then used.
Greater than or less than while using a subtract to determine how they are used and the result itself is not affected by signedness how the flags are interpreted very much are.
Take some four bit positive numbers.
1101 - 1100 (13 - 12)
1100 - 1100 (12 - 12)
1011 - 1100 (11 - 12)
11111
1101
+ 0011
=======
0001
carry out 1, zero flag 0, v = 0, n = 0
11111
1100
+ 0011
========
0000
carry out 1, zero flag 1, v = 0, n = 0
00111
1011
+ 0011
========
1111
carry out 0, zero flag 0, v = 0, n = 1
(n is the msbit of the result, the sign bit 1 means signed negative number, zero means signed positive number)
cz
10 greater than but not equal
11 equal
00 less than but not equal
same bit patterns
1101 - 1100 (-3 - -4)
1100 - 1100 (-4 - -4)
1011 - 1100 (-5 - -4)
cz
10 greater than but not equal
11 equal
00 less than but not equal
so far nothing changed.
but if I examine all the combinations
#include <stdio.h>
int main ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
unsigned int rx;
unsigned int v;
unsigned int n;
int sa,sb;
for(ra=0;ra<0x10;ra++)
for(rb=0;rb<0x10;rb++)
{
for(rx=8;rx;rx>>=1) if(rx&ra) printf("1"); else printf("0");
printf(" - ");
for(rx=8;rx;rx>>=1) if(rx&rb) printf("1"); else printf("0");
rc=ra-rb;
printf(" = ");
for(rx=8;rx;rx>>=1) if(rx&rb) printf("1"); else printf("0");
printf(" c=%u",(rc>>4)&1);
printf(" n=%u",(rc>>3)&1);
n=(rc>>3)&1;
if((rc&0xF)==0) printf(" z=1"); else printf(" z=0");
v=0;
if((ra&8)==(rb&8))
{
if((ra&8)==(rc&8)) v=1;
}
printf(" v=%u",v);
printf(" (%2u - %2u)",ra,rb);
sa=ra;
if(sa&8) sa|=0xFFFFFFF0;
sb=rb;
if(sb&8) sb|=0xFFFFFFF0;
printf(" (%+2d - %+2d)",sa,sb);
if(rc&0x10) printf(" C ");
if(n==v) printf(" NV ");
printf("\n");
}
}
you can find fragments within the output that show the problem.
0000 - 0110 = 0110 c=1 n=1 z=0 v=0 ( 0 - 6) (+0 - +6) C
0000 - 0111 = 0111 c=1 n=1 z=0 v=0 ( 0 - 7) (+0 - +7) C
0000 - 1000 = 1000 c=1 n=1 z=0 v=0 ( 0 - 8) (+0 - -8) C
0000 - 1001 = 1001 c=1 n=0 z=0 v=0 ( 0 - 9) (+0 - -7) C NV
0000 - 1010 = 1010 c=1 n=0 z=0 v=0 ( 0 - 10) (+0 - -6) C NV
0000 - 1011 = 1011 c=1 n=0 z=0 v=0 ( 0 - 11) (+0 - -5) C NV
For unsigned 0 is less than 6,7,8,9... so the carry out is set so that means greater than. But the same bit patterns signed 0 is less than 6 and 7 but greater than -8 -7 -6 ...
What is not obvious necessarily until you stare at it a lot or just cheat and look at ARMs documentation for signed if N == V it is a signed greater than or equal. for N != V it is a signed less than. don't need to examine the carry out. particularly the signed bit pattern problems 0000 and 1000 don't work with the carry like other bit patterns.
Hmm, I wrote this all up in other questions before. Anyway, multiply both does and doesn't care about unsigned and signed.
Using your calculator 0xF * 0xF = 0xE1. The biggest 4 bit number times the biggest 4 bit number gives an 8 bit number, we need twice as many bits to cover all the bit patterns.
1111
* 1111
=================
1111
1111
1111
+ 1111
=================
11100001
so we see the addition that results is at least 2n-1 bits, if you end up with a carry off that last bit then you end up with 2n bits.
but, what is -1 * -1? its equal to 1 right? what are we missing?
unsigned has implied zeros
00001111
* 1111
=================
00001111
00001111
00001111
+00001111
=================
00011100001
but signed the sign is extended
11111111
* 1111
=================
11111111
11111111
11111111
+11111111
=================
00000000001
so sign matters with multiply?
0xC * 0x3 = 0xF4 or 0x24.
#include <stdio.h>
int main ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
unsigned int rx;
int sa;
int sb;
int sc;
for(ra=0;ra<0x10;ra++)
for(rb=0;rb<0x10;rb++)
{
sa=ra;
if(ra&8) sa|=0xFFFFFFF0;
sb=rb;
if(rb&8) sb|=0xFFFFFFF0;
rc=ra*rb;
sc=sa*sb;
if((rc&0xF)!=(sc&0xF))
{
for(rx=8;rx;rx>>1) if(rx&ra) printf("1"); else printf("0");
printf(" ");
for(rx=8;rx;rx>>1) if(rx&rb) printf("1"); else printf("0");
printf("\n");
}
}
}
and there is no output. as expected. the bits abcd * 1111
abcd
1111
===============
aaaaabcd
aaaaabcd
aaaaabcd
aaaaabcd
================
four bits in on each operand if I only care about the lower four bits out
abcd
1111
===============
abcd
bcd
cd
d
================
how the operand sign extends does not matter as far as the result is concerned
Now knowing that a significant portion of the possible combinations of n bit times n bit equals n bit overflow it doesnt help you much to do such a thing in any code you want to be useful.
int a,b,c;
c = a * b;
not very useful except for smaller numbers.
But the reality is as far as multiply if the result is the same size as the operands then signed vs unsigned does not matter, if the result is the proper twice the size of the operands then you need a separate signed multiply instruction/operation and an unsigned. You can certainly cascade/synthesize the nn=2n with an nn=n instruction as you will see in some instruction sets.
bitwise operands, xor, or, and, these are bitwise they dont/cant care about sign.
shift left start with abcd shift one bcd0, shift two cd00 and so on. not very interesting. Shift right though desires to have separate arithmetic and logical shift right where arithmetic the msbit is duplicated as the shift in bit, and logical a zero shifts in arithmetic abcd aabc aaab aaaa, logical abcd 0abc 00ab 000a 0000
But we dont have two kinds of shift right in C. But when doing addition and subtraction directly, bits is bits, the beauty of twos complement. When doing a comparison which is a subtract then the flags used are different for signed vs unsigned for a number of the comparisons, get the older ARM architectural reference manual, I think they call it the armv5 one, even though it goes back to the armv4 and up to the armv6.
There is a section called "The condition field" and a table, this very nicely shows at least for the ARM flags the flag combinations for both unsigned this and that, signed this and that and the ones that dont care about signedness (equal, not equal, etc) wont say anything.
Understand/remember that some instruction sets not only invert the carry in bit and second operand on a subtract but will also invert the carry out bit. so if a carry bit is used on something signed then it is inverted. the stuff I did above where I tried to use the term carry out instead of carry flag, the carry flag would be inverted for some other instruction sets and the unsigned greater than and less than table flips over.
Division is not as easy to show, you have to do long division, etc. I will leave that one to the reader.
Not all documentation is as good as the table I am referring to in ARMs docs. Other processor documentation may or may not make the unsigned vs signed, they might just say jump if greater than and you may have to experimentally figure out what that means. Now that you now all of this you may have already figured out you dont for example need a branch if unsigned or equal. That just means branch if not less than so you can
cmp r0,r1
or
cmp r1,r0
and just use branch if carry to cover the unsigned less than, unsigned less than or equal, unsigned greater than, unsigned greater than or equal cases. Although you might upset some programmers doing that because you were trying to save some bits in the instruction.
Saying ALL of that, the processor never distinguishes signed from unsigned. These are concepts that only mean something to the programmer, processors are very stupid. Bits is bits, the processor doesnt know if these bits are an address, if they are a variable if they are a character in a string, a floating point number (being implemented with a soft float library in fixed point), these interpretations are only meaningful to the programmer not the processor. The processor does not "distinguish between unsigned and signed in machine code", the programmer has to properly place bits that are meaningful to the programmer and then select the right instructions and sequences of instructions to perform the task the programmer wants performed. Some 32 bit number in a register is only an address when those bits are used to address something with a load or store, once that one clock cycle where they are sampled to be delivered to an address bus they are an address, before and after that they are just bits. When you increment that pointer in your program they are not an address they are just bits you are adding some other bits to. You can certainly build a MIPS like instruction set with no flags, and only N bit to N bit multiplies, only have a jump if two registers are equal or not equal instruction no other greater than or less than type instructions and still be able to make useful programs just like instruction sets that go overboard with those things unsigned this flag and signed that flag, unsigned this instruction and signed that instruction.
A not so popular but sometimes talked about in school, maybe there was a real instruction set or many that did this is a non-twos complement solution and that pretty much means sign and magnitude a sign bit and an unsigned value so +3 is 0011 and -3 is 1011 for a four bit register that burns one bit for sign when doing signed math. You then as with twos complement have to sit down with pencil and paper and work through the math operations, grade school style, then implement those in logic. Does this result in a separate unsigned and signed add? twos complement 4 bit registers we can do 0-15 and -8 to +7 for sign magnitude we can declare unsigned is 0 - 15 but signed is -7 to +7. An exercise for the reader, the question/quote had to do with twos complement.
Check out Two's Complement and its arithmetic operations, it is signed numbers in binary.
Two's complement is the most common method of representing signed
integers on computers. In this scheme, if the binary number 010(2)
encodes the signed integer 2(10), then its two's complement, 110(2),
encodes the inverse: -2(10). In other words, to reverse the sign of any
integer in this scheme, you can take the two's complement of its
binary representation.
That way it is possible to have arithmetic operations between positive and negative binary values.
Two's Complement Python code snippet:
def twos_complement(input_value, num_bits):
'''Calculates a two's complement integer from the given input value's bits'''
mask = 2**(num_bits - 1)
return -(input_value & mask) + (input_value & ~mask)

Assembly Loop Through Each Bit of Register Value

I have a register $t0 that has some integer stored to it. For example, say I store the integer 1100 to it. The binary representation of this value is 0000010001001100. Of course it may extend to 32 bits for 32 bit register but that is easily done.
I am trying to implement a loop in assembly that iterates through each bit of the register value and checks if it is a 1 or 0. How would one do this?
Perhaps I misunderstand the nature of a register. It is my understanding that the register stores a 32 bit number, yes? Does that mean each bit is stored at a specific address?
I have tried using shift for the register and checking the bit but that has failed. I also looked into the lb command but this loads the byte, not the bit. So what would one do?
some basics:
most(all?) shift instructions shift out the bit into the carry flag
most(all?) CPUs have a branch command, to jump to the location on carry flag set
combining this, you could do the following:
load register1,< the value to be tested >
load register2, 0
Lrepeat:
compare register1 with 0
jump if zero Lexit
shift right register1
jump no carry Lskip
increase register2
Lskip:
jump Lrepeat
Lexit: ; when you end up here, your register2
; holds the count of bit register1 had set
some optimization still can be done:
Lrepeat:
compare register1 with 0
jump if zero Lexit
shift right register1
jump no carry Lskip <-- note: here you jump ...
increase register2
Lskip:
jump Lrepeat <-- ... to jump again!
Lexit:
=====>
Lrepeat:
compare register1 with 0
jump if zero Lexit
shift right register1
jump no carry Lrepeat <-- so you can optimize it this way
increase register2
Lskip:
jump Lrepeat
Lexit:
some CPUs have an "add carry" instuction,
e.g. 6502:
ADC register,value ; register = register + value + Carry flag
this could be used to avoid the branch (conditional jump), by adding 0 ( plus Carry of course) to register2 each loop
shift right register1
add with carry register2,0 ; may look weird, but this adds a
; 1 if the carry is set, which is
; exactly what we want here
jump Lrepeat
note that you don't need to know the register size! you just loop until the register is 1, which can save you a lot of time e.g. if your value is something like 0000 0000 0000 0000 0000 0000 0001 1001
On any processor, set up a loop to count the number of bits in a register, in this case 32. On each pass through the loop, AND the register of interest with 1. Then add the result to an accumulator, and finally shift the register. That gives you the number of set bits.
The precise instructions vary from processor to processor. To loop you normally set a label, decrement a counter, then execute an instruction with a name like branch_not_equal_to zero (BNZ, BNEQ0 something like that). The and will have a name like AND, ANDI (and immediate). The ADD might be ADD, ADC (add with carry). The shift will be something like ASR (arithmetic shift right) LSR (logical shift right), you might have to pass it 1 to say shift only one place. But all processors will allow you to read out register bits in essentially this way.
Have the register that you want to iterate through be denoted as R0. So, for example, the least significant bits are R0= 1011 0101.
2) use a second cleared register R1=0000 0001.
3) AND R1 with R0 and then right shift R0(so the next iteration checks the next bit of R0).
4) Let R3 be a third register that increments by 1 IF the AND operation results in a 1 (that is, you run into a 1 in R0). ELSE, loop again to check the next bit in R0.
You could iterate through the entire 32 bits or a length of your choice by having a decrementing loop counter .

Understanding PowerPC rlwinm instruction

So I finally convinced myself to try and learn/use PowerPC (PPC).
Everything is going well and most information was found online.
However, when looking at some examples I came across this:
rlwinm r3, r3, 0,1,1
How would I do this in C?
I tried doing some research, but couldn't find anything that helped me out.
Thanks in advance!
rlwinm stands for "Rotate Left Word Immediate then aNd with Mask, and it's correct usage is
rlwinm RA, RS, SH, MB, ME
As per the description page:
RA Specifies target general-purpose register where result of operation is stored.
RS Specifies source general-purpose register for operation.
SH Specifies shift value for operation.
MB Specifies begin value of mask for operation.
ME Specifies end value of mask for operation.
BM Specifies value of 32-bit mask.
And
If the MB value is less than the ME value + 1, then the mask bits
between and including the starting point and the end point are set to
ones. All other bits are set to zeros.
If the MB value is the same as
the ME value + 1, then all 32 mask bits are set to ones.
If the MB value is greater than the ME value + 1, then all of the mask bits
between and including the ME value +1 and the MB value -1 are set to
zeros. All other bits are set to ones.
So in your example the source and target are the same. Shift amount is 0, so no shift. And MB=ME=1, so the first case applies, such that the mask becomes all zeros with bit number 1 as 1, while numbering from MSB=0: 0x40000000.
In C we can write it as simple as
a &= 0x40000000;
assuming a is 32-bit variable.
rlwinm rotates the value of a register left by the specified number, performs an AND and stores the result in a register.
Example: rlwinm r3, r4, 5, 0, 31
r4 is the source register which is rotated by 5 and before the rotated result is placed in r3, it is also ANDed with a bit mask of only 1s since the interval between 0 and 31 is the entire 32-bit value.
Example taken from here.
For a C implementation you may want to take a look at how to rotate left and how to AND which should be trivial to build together now. Something like the following should work:
int rotateLeft(int input, int shift) {
return (input << shift) | ((input >> (32 - shift)) & ~(-1 << shift));
}
int rlwinm(int input, int shift, int mask) {
return rotateLeft(input, shift) & mask;
}

Shifts in 2 32-bit registers

void rotate( unsigned long mask[], int rotateCnt );
This function rotates the current 64-bit mask (mask[]) by rotateCnt places. If the rotateCnt is positive, rotate left; if the rotateCnt is negative, rotate right. Only the lower 6 bits of the rotateCnt should be used for the rotateCnt.
But I have to do the rotate, in 2 32-bit registers that simulates 1 64 bit register, logically performing 64 bit operations across two 32-bit registers. They told me to do 2 loops, but I can't figure this out? Any h
As you're using x86, take a look at shld and shrd. You won't need loops (why they asked for loops is beyond me).
Update
Here's a DevStudio 2005 style function that uses inline assembler to do what you want. Do not present this as a solution without fully understanding how it all works (especially how the negative counts do a right rotate) because it will be incredibly easy for your teachers to spot that you copied this without knowing how it works (i.e. Teacher: "How does this work?", You: "Errr..." => FAIL).
void Rotate
(
unsigned *value, // pointer to two 32bit integers
int count // number of bits to rotate: >= 0 left, < 0 = right
)
{
__asm
{
mov esi,value
mov eax,[esi]
mov edx,[esi+4]
mov ecx,count
mov ebx,eax
shld eax,edx,cl
shld edx,ebx,cl
test cl,32
jz noswap
xchg edx,eax
noswap:
mov [esi],eax
mov [esi+4],edx
}
}
There are probably quicker instructions for this, but here's the idea... If you're rotating left:
take the most significant rotateCnt bits from the high-order register, shift them right 32-rotateCnt bits, and stash the result somewhere
shift the high-order register left by rotateCnt bits
take the most significant rotateCnt bits from the low-order register, shift them left 32-rotateCnt bits, and add the result to the high-order register
shift the remaining bits in the low-order register left by rotateCnt bits and add the bits that you saved in step 1
I'm sure you can see how to extend this process to any number of registers. If rotateCnt can be larger than 32 bits, you'll have to work a little harder, especially in the general case (n registers instead of just 2). One thing that may help is to notice that shifting left by n bits is the same as shifting right by (size-n) bits.
From your comments, I see that you're supposed to use a loop. You can always apply the rotate procedure described above 1 bit at a time for rotateCnt iterations. In that case, you'd obviously change rotateCnt in the description above to 1.
A single bit rotate is simply a single bit shift with carries out of one word being applied to the next word with a special case that a carry out of the high word gets applied to the low word.
It may help you to consider a picture of what needs to happen in certain scenarios. I'll use 4-bit words below and I'll assume the rotate is to the left; the same concepts apply to whatever word size you might use:
// Note '-' in the carry column means "don't care"
//
// starting value (in binary):
'high' 'low'
carry word carry word
- 1 0 0 0 - 1 0 0 1
// after shift left of each word:
1 0 0 0 0 1 0 0 1 0
// apply the carry out of the low word
// to the high word:
1 0 0 0 1 - 0 0 1 0
// apply the carry out of the high word
// to the low word
- 0 0 0 1 - 0 0 1 1
To use this basic operation to rotate multiple positions, just loop the appropriate number of times.
Note that this can be done without any loops at all by applying the right set of bitmasks and shifts. Basically you can get all the bits that will carry out of a word in one shot without looping. A looping version is probably more straightforward to implement - you might consider doing that first and using it as a verification test if you decide to improve it to a non-looping version.
think about how you would do this in C for example, then translate that to asm.
Using 32 bit variables to do a single bit shift left for example, assuming ra is the upper 32 bits and rb the lower
if(rb&0x80000000) { ra<<=1; ra|=1; rb<<=1 }
else { ra<<=1; rb<<=1; }
For a rotate you might do something along these lines
if(rb&0x80000000)
{
if(ra&0x80000000) { ra<<=1; ra|=1; rb<<=1: rb|=1; }
else { ra<<=1; ra|=1; rb<<=1; }
}
else
{
if(ra&0x80000000) { ra<<=1; rb<<=1: rb|=1; }
else { ra<<=1; rb<<=1; }
}
You can then wrap a loop around one of those and do it N times.
Or say an 8 bit shift left
ra=(ra<<8)|(rb>>(32-8));
rb<<=8;
Or say an N bit shift left
ra=(ra<<=n)|(rb>>(32-n));
rb<<=n;
Or an n bit rotate left (which is the same as a 32-n bit rotate right)(there is a reason why some processors only have a rotate right and the left is virtual or vice versa).
temp=ra>>(32-n);
ra=(ra<<=n)|(rb>>(32-n));
rb=(rb<<<=n)|temp;
Then look at the instruction set and see what is available and matches what you are doing.
In short to shift bits you need to take the bit on one side and put it in the next bit. If you align yourself on some boundary like a variable or register there is no difference you take the bit from one side and shift it into the other, it may take more code as the instruction set or programming language doesnt support it directly doesnt mean you cant do it. Just like you can perform a 2048 bit multiply on an 8 bit processor with no multiply instruction, just takes more code than other processors, but it is very doable.

Resources