Understanding PowerPC rlwinm instruction - c

So I finally convinced myself to try and learn/use PowerPC (PPC).
Everything is going well and most information was found online.
However, when looking at some examples I came across this:
rlwinm r3, r3, 0,1,1
How would I do this in C?
I tried doing some research, but couldn't find anything that helped me out.
Thanks in advance!

rlwinm stands for "Rotate Left Word Immediate then aNd with Mask, and it's correct usage is
rlwinm RA, RS, SH, MB, ME
As per the description page:
RA Specifies target general-purpose register where result of operation is stored.
RS Specifies source general-purpose register for operation.
SH Specifies shift value for operation.
MB Specifies begin value of mask for operation.
ME Specifies end value of mask for operation.
BM Specifies value of 32-bit mask.
And
If the MB value is less than the ME value + 1, then the mask bits
between and including the starting point and the end point are set to
ones. All other bits are set to zeros.
If the MB value is the same as
the ME value + 1, then all 32 mask bits are set to ones.
If the MB value is greater than the ME value + 1, then all of the mask bits
between and including the ME value +1 and the MB value -1 are set to
zeros. All other bits are set to ones.
So in your example the source and target are the same. Shift amount is 0, so no shift. And MB=ME=1, so the first case applies, such that the mask becomes all zeros with bit number 1 as 1, while numbering from MSB=0: 0x40000000.
In C we can write it as simple as
a &= 0x40000000;
assuming a is 32-bit variable.

rlwinm rotates the value of a register left by the specified number, performs an AND and stores the result in a register.
Example: rlwinm r3, r4, 5, 0, 31
r4 is the source register which is rotated by 5 and before the rotated result is placed in r3, it is also ANDed with a bit mask of only 1s since the interval between 0 and 31 is the entire 32-bit value.
Example taken from here.
For a C implementation you may want to take a look at how to rotate left and how to AND which should be trivial to build together now. Something like the following should work:
int rotateLeft(int input, int shift) {
return (input << shift) | ((input >> (32 - shift)) & ~(-1 << shift));
}
int rlwinm(int input, int shift, int mask) {
return rotateLeft(input, shift) & mask;
}

Related

How to find the leftmost 6 bits of a number

I am working on a project and it asks me to find the instruction and register from the given input using bit operators. For example:
Given: 0x316ac000 => The output will be lt R5 R10 R11
R: represent the register
I tried to convert it on paper. The way I did it was first convert that input to binary (ignore the 0x), then I group the left most 6 bits which give me a decimal = 12, 12 is sub base on the table.
So how can I actually code it? or the logic on this
Thank you for your help!
Binary is a notation for humans.
To get bits from a number, use the bit shift and bit mask operations. For example, to get bits 3-5 of a value:
(value >> 3) & 0x7
That is, shift right three bits (to get rid of bits 0, 1, and 2) and bit mask (logical AND) with a 7, which equals 0x111.
You can compute a mask by a bit shift and subtract:
(1 << N_bits) - 1
So we can write a function:
unsigned long get_bits( int start, int stop, unsigned long value )
{
unsigned long mask = (1UL << (stop - start + 1)) - 1;
return (value >> start) & mask;
}
BTW, bit shifts are tricky with integers. Use unsigned values.

What does mask variable do in this CRC checksum calculation?

The question is about code in figure 14-6 in here.
The mask is calculated as:
mask = -(crc & 1)
Why do we & crc with 1 and then make result negative? The Figure 14-5 does not have this mask variable, why?
Edit:
So since this point is clear, why do we have this line also:
crc = crc ^ byte;
This line is not present in Figure 14-5.
Can this program be used if the generator polynomial length is not multiple of 8 bits?
What that does is to check the least significant bit of crc and then negating it. The effect is that if the bit is zero the mask will be zero (that is all zeroes) and if the bit is one the mask will be -1 (that is all ones). This is used to conditionally xor with 0xEDB88320.
The other solution instead uses if to make that condition.
The second trick they're using in the second solution is to do the xor for the bit check in one operation for all eight bits. In the first example they use (int)(crc^byte) < 0 (which means a check for the XOR of the most significant bit or the sign bit), they then shift both crc and byte one bit to the left and do the same on next bit. In the second example they do the XOR eight bits at a time and then checks each bit of the result.
To see what happens, consider if we change the first example to:
for(j=0; j<=7; j++) {
crc = crc ^ mask_sign_bit(byte);
if( (int)crc < 0 )
crc = (crc << 1) ^ 0x04C11DB7;
else
crc = crc << 1;
byte = byte << 1;
}
where mask_sign_bit masks out every bit except the sign bit, the sign of crc ^ byte becomes the same as crc ^ mask_sign_bit(byte) so the consequence of the if statement becomes the same. Then when shifting crc to the left one step the bit modified by crc = crc ^ mask_sign_bit(byte) will be lost.
This operation turns the least significant bit into a mask.
For example, for an 8-bit value (for simplicity) we have:
00000000 -> 00000000
00000001 -> 11111111
Using unary minus complicates the circuitry of the CRC function massively, which otherwise requires no addition operations. It can be implemented as function of addition, as follows
-x = ~x + 1
Some architectures might support a bit-vector "broadcast" operation, to send the least significant bits to all others bits, which will give huge performance gain.

How to determine if carry out occuurs in C

I'm writing an ARM11 emulator and now I'm trying to set CPRS flags which are N(negative result), Z(zero), C(carry out) and V(overflow)
this is what the spec says:
The C bit in logical operations (and, eor, orr, teq, tst and mov) will be set to the carry out from
any shift operation (i.e.the result from the barrel shifter). In arithmetic operations (add, sub, rsb
and cmp) the C bit will be set to the carry out of the bit 31 of the ALU.
My question is, how do I determine the carry out from logical and arithmetic operations?
Operations work on two uint32_t, e.g. my eor operation simply returns x ^ y, and after that I need to set CPRS flags.
EDIT:
For addition, C is set
to 1 if the addition produced a carry (unsigned over
ow), it is set to 0 otherwise. For subtraction
(including comparison), the bit C is set to 0 if the subtraction produced a borrow, otherwise is set
to 1.
Logical operations are a bit of a red herring here - obviously eor r0, r1, r2 isn't going to produce overflow or carry. However, it's not the logical operations themselves that we care about:
The C bit in logical operations (and, eor, orr, teq, tst and mov) will be set to the carry out from any shift operation (i.e.the result from the barrel shifter).
Remember that optional shift on any data processing instruction? Given eor r0, r1, r2 lsl #3, the carry you care about is whatever r2 lsl #3 generates. However you're implementing flag-setting for shifts*, do that.
* if you're stuck on that too, I saw plenty of good ideas in a quick flick through the related questions over there -->
I posit that the code snippet below is probably as close as you're going to get with standard C and using logic operations to determine carry out and signed arithmetic overflow. This approach is an adaptation of how look ahead carry circuits are generated for arbitrary word lengths in FPGA's.
The basic sequence is to first determine which bit pairs will generate a carry and which will propagate a carry. In this presentation, the initial carry-in is presumed to be zero. A mask is marched along the "generate" and "propagate" words and with some logic and previous carry, determine carry to the next iteration. At the end of iteration, the carry flag will be set (or not) depending on the word pair bits to be added. The downside, is that this programming loop would be repeated every time you wanted to determine carry out and overflow for a given word pair - no such penalty in physical circuits or FPGA.
As a bonus, it's super easy to determine an overflow flag, which indicates whether the 2's compliment addition will be representable from the summation.
See the reference links below.
The code is for 32-bit integers, however could be adapted for longer or shorter types.
http://www.righto.com/2012/12/the-6502-overflow-flag-explained.html
https://en.wikipedia.org/wiki/Carry-lookahead_adder
// Global carry and overflow flags
// Set by carryLookahead()
bool carry, ov;
// Determines presence of carry out and overflow from 2's compliment addition
//
bool carryLookahead(int32 f1, int32 f2){
unsigned long mask;
unsigned long g,p;
unsigned char i;
// uses & sets global carry and ov flag variables
mask=1;
carry=ov=false; // initial carry and overflow flag assumed to be zero
g = f1 & f2; // bit pairs that will generate carry
p = f1 | f2; // bit pairs that will propagate a carry
for(i=0; i < 32; ++i, mask <<= 1){
ov=carry; // set ov to last carry
carry = (g&mask) || (p&mask) && carry; // use logical rather than bitwise logic to set the current carry;
ov=ov^carry; // ov is xor of last and current carries
}
return(carry);
}

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

Can someone explain ARM bitwise operations to me?

Can someone explain ARM bit-shifts to me like I'm five? I have a very poor understanding of anything that involves non-decimal number systems so understanding the concepts of bit shifts and bitwise operators is difficult for me.
What would each of the following cases do and why (what would end up in R3 and what happens on behind the scenes on the bit level)?
/** LSL **/
mov r0, #1
mov r3, r0, LSL#10
/** LSR **/
mov r0, #1
mov r3, r0, LSR#10
/** ORR **/
mov r0, #1
mov r1, #4
orr r3, r1, r0
/** AND **/
mov r0, #1
mov r1, #4
and r3, r1, r0
/** BIC **/
mov r0, #1
mov r1, #4
bic r3, r1, r0
PS. Do not explain it in terms of C bitwise operators. I don't know what they do either (the >>, <<, |, & ones).
Truth tables, two inputs, the two numbers on the left and one output, the number on the right:
OR
a b c
0 0 0
0 1 1
1 0 1
1 1 1
the left two inputs a and b represent the four possible combinations of inputs, no more no less that is the list.
Consider a 1 to mean true and 0 to mean false. And the word OR in this case means if a OR b is true then c is true. And as you see in the table, horizontally if either a or b is true then c is true.
AND
a b c
0 0 0
0 1 0
1 0 0
1 1 1
And means they both have to be true if a AND b are both true then c is true. There is only one case where that exists above.
Now take two bytes 0x12 and 0x34 which in decimal are 18 and 52 but we dont really care much about decimal. we care about binary 0x12 is 0b00010010 and 0x34 is 0b00110100. The bitwise operators like AND and OR and XOR in assembly language mean you take one bit from each operand and that gives the result in the same bit location. Its not like add where you have things like this plus that equals blah carry the one.
so we line up the bits
0b00010010 0x12
0b00110100 0x34
So tilt your head sidways like you are going to take a bite out of a taco held in your left hand and visualize the truth table above. If we look at the two bits on the right they are 0 and 0, the next two bits are 1 and 0 and so on. So if we wanted to do an OR operation, the rule is if either a or b is true then c, the result, is true
0b00010010
0b00110100
OR ==========
0b00110110
Head tilted to the right, least significant bit (the bit in the ones column in the number) 0 or 0 = 0, neither one is set. next column (the twos column) 1 or 0 = 1 at least one is true. and so on so
0x12 OR 0x34 = 0x36
In arm assembly that would be
mov r0,#0x12
mov r1,#0x34
orr r2,r0,r1
after the or operation r2 would hold the value 0x36.
Now lets and those numbers
0b00010010
0b00110100
AND ==========
0b00010000
Remembering our truth table and the rule both a and b have to be true (a 1) we tilt our head to the right, 0 and 0 is 0, both are not true. and by inspection only one column has both inputs with a 1, the 16s column. this leaves us with 0x12 AND 0x34 = 0x10
In arm assembly that would be
mov r0,#0x12
mov r1,#0x34
and r2,r0,r1
Now we get to the BIC instruction. Which stands for bitwise clear, which hopefully will make sense in a bit. Bic on the arm is a anded with not b. Not is another truth table, but only one input and one output
NOT
a c
0 1
1 0
With only one input we have only two choices 0 and 1, 1 is true 0 is false. NOT means if not a then c is true. when a is not true c is true, when a is true c is not true. Basically it inverts.
What the bic does is have two inputs a and b, the operation is c = a AND (NOT b) so the truth table for that would be:
a AND (NOT b)
a b c
0 1 0
0 0 0
1 1 0
1 0 1
I started with the AND truth table then then NOTted the b bits, where b was a 0 in the AND truth table I made it a 1 where b was a 1 in the AND truth table I made it a 0.
So the bic operation on 0x12 and 0x34 is
0b00010010
0b00110100
BIC ==========
0b00000010
Why is it called bit clear? Understanding that makes it much easier to use. If you look at the truth table and think of the first and second inputs. Where the second, b, input is a 1 the output is 0. where the second input, b, is a 0, the output is a itself unmodified. So what that truth table or operation is doing is saying anywhere b is set clear or zero those bits in A. So if I have the number 0x1234 and I want to zero the lower 8 bits, I would BIC that with 0x00FF. And your next question is why not AND that with 0xFF00? (analyze the AND truth table and see that wherever b is a 1 you keep the a value as is, and wherever b is a 0 you zero the output). The ARM uses 32 bit registers, and a fixed 32 bit instruction set, at least traditionally. The immediate instructions
mov r0,#0x12
In arm are limited to 8 non-zero bits shifted anywhere within the number, will get to shifting in a bit. So if I had the value 0x12345678 and wanted to zero out the lower 8 bits I could do this
; assume r0 already has 0x12345678
bic r0,r0,#0xFF
or
; assume r0 already has 0x12345678
mov r1,#0xFF000000
orr r1,r1,#0x00FF0000
orr r1,r1,#0x0000FF00
;r1 now contains the value 0xFFFFFF00
and r0,r0,r1
or
; assume r0 already contains 0x12345678
ldr r1,my_byte_mask
and r0,r0,r1
my_byte_mask: .word 0xFFFFFF00
which is not horrible, compared to using a move and two orrs, but still burns more clock cycles than the bic solution because you burn the extra memory cycle reading my_byte_mask from ram, which can take a while.
or
; assume r0 already contains 0x12345678
mvn r1,#0xFF
and r0,r0,r1
This last one being not a bad compromize. note that mvn in the arm documentation is bitwise not immediate, that means rx = NOT(immediate). The immediate here is 0xFF. NOT(0xFF) means invert all the bits, it is a 32 bit register we are going to so that means 0xFFFFFF00 is the result of NOT(0xFF) and that is what the register r1 gets, before doing the and.
So that is why bic has a place in the ARM instruction set, because sometimes it takes fewer instructions or clock cycles to mask (mask = AND used to make some bits zeros) using the bic instruction instead of the and instruction.
I used the word mask as a concept to make bits in a number zero leaving the others alone. orring can be thought of as making bits in a number one while leaving the others alone, if you look at the OR truth table any time b is a 1 then c is a 1. So 0x12345678 OR 0x000000FF results in 0x123456FF the bits in the second operand are set. Yes it is also true that anytime a is set in the OR truth table then the output is set, but a lot of the time when you use these bitwise operations you have one operand you want to do something to, set a certain number of bits to one without modifying the rest or set a certain number of bits to zero without modifying the rest or you want to zero all the bits except for a certain number of bits. When used that way you have one operand coming in which is what you want to operate on and you create the second operand based on what you want the overall effect to be, for example in C if we wanted to keep only the lower byte we could have a one parameter in, one parameter out function:
unsigned int keep_lower_byte ( unsigned int a )
{
return(a&(~0xFF));
}
~ means NOT so ~0xFF, for 32 bit numbers means 0xFFFFFF00 then & means AND, so we return a & 0xFFFFFF00. a was the only real operand coming in and we invented the second one based on the operation we wanted to do...Most bitwise operations you can swap the operands in the instruction and everything turns out okay, instructions like ARM's bic though the operands are in a certain order, just like a subtract you have to use the correct order of operands.
Shifting...there are two kinds, logical, and arithmetic. logical is easiest and is what you get when you use >> or << in C.
Start with 0x12 which is 0b00010010. Shifting that three locations to the left (0x12<<3) means
00010010 < our original number 0x12
0010010x < shift left one bit location
010010xx < shift left another bit location
10010xxx < shift left a third bit location
What bits get "shifted in" to the empty locations, the x'es above, varies based on the operation. For C programming it is always zeros:
00010010 < our original number 0x12
00100100 < shift left one bit location
01001000 < shift left another bit location
10010000 < shift left a third bit location
But sometimes (usually every instruction set supports a rotate as well as a shift) there are other ways to shift and the differences have to do with what bit you shift into the empty spot, and also sometimes the bit you shifted off the end doesnt always just disappear sometimes you save that in a special bit holder location.
Some instruction sets only have a single bit shift meaning for each instruction you program you can only shift one bit, so the above would be 3 instructions, one bit at a time. Other instruction sets, like arm, allow you to have a single instruction and you specify in the instruction how many bits you want to shift in that direction. so a shift left of three
mov r0,#0x12
mov r3,r0,lsl#3 ; shift the contents of r0 3 bits to the left and store in r3
This varying of what you shift in is demonstrated between lsr and asr, logical shift right and arithmetic shift right (you will see that there is no asl, arithmetic shift left because that makes no sense, some assemblers will allow you to use an asl instruction but encode it as a lsl).
A LOGICAL shift right:
00010010 - our original number 0x12
x0001001 - shifted right one bit
xx000100 - shifted right another bit
xxx00010 - shifted right another bit
As with C there is a version that shifts in zeros, that is the logical shift right, shifting in zeros
00010010 - our original number 0x12
00001001 - shifted right one bit
00000100 - shifted right another bit
00000010 - shifted right another bit
ARITHMETIC shift right means preserve the "sign bit" what is the sign bit? that gets into twos complement numbers which you also need to learn if you have not. Basically if you consider the bit pattern/value to be a twos complement number then the most significant bit, the one on the left, is the sign bit. if it is 0 the number is positive and 1 the number is negative. You may have noticed that a shift left by one bit is the same as multiplying by 2 and a shift right is the same as dividing by 2. 0x12 >> 1 = 0x9, 18 >> 1 = 9 but what if we were to shift a minus 2 to the right one, minus two is 0xFE using bytes or 0b11111110. using the C style logical shift right 0xFE >> 1 = 0x7F, or in decimal -2 >> 1 = 0x127. We cannot solve that in C in a single operation, unfortunately, but in assembly we can using an arithmetic shift, assuming your instruction set has one, which the arm does
ARITHMETIC shift right
s1100100 - our starting value s is the sign bit whatever that is 0 or 1
ss110010 - one shift right
sss11001 - another shift right
ssss1100 - another shift right
So if the sign bit s was a 0 when we started, if the number was 01100100 then
01100100 - our starting value
00110010 - one shift right
00011001 - another shift right
00001100 - another shift right
but if that sign bit had been a one
11100100 - our starting value
11110010 - one shift right
11111001 - another shift right
11111100 - another shift right
And we can solve the 0xFE shifted right one:
11111110 - 0xFE a minus 2 in twos complement for a byte
11111111 - shifted right one
so in pseudo code 0xFE ASR 1 = 0xFF, -2 ASR 1 = -1. -2 divided by 2 = -1
The last thing you need to read up on your own has to do with rotates and/or what happens to the bit that is shifted off the end. a shift right the lsbit is shifted "off the end" of the number like blocks being slid of a table and the one that falls off might just go into the "bit bucket" (ether, heaven or hell, one of these places where bits go to die when they disappear from this world). But some instructions in some instruction sets will take that bit being shifted off and put it in the Carry flag (read up on add and subtract), not because it is a carry necessarily but because there are status bits in the alu and the Carry bit is one that kinda makes sense. Now what a rotate is, is lets say you had an 8 bit processor and you rotated one bit, the bit falling off the end lands in the Carry bit, AND the bit shifting in the other side is what was in the carry bit before the operation. Basically it is musical chairs, the bits are walking around the chairs with one person left standing, the person standing is the carry bit, the people in chairs are the bits in the register. Why is this useful at all? lets say we had an 8 bit processor like the Atmel AVR for example but wanted to do a 64 bit shift. 64 bits takes 8, 8 bit, registers, say I have my 64 bit number in those 8 registers and I want to do a 64 bit shift left one bit. I would start with the least significant byte and do an lsl which shifts a zero in but the bit shifting out goes into the carry bit. then the next most significant byte I do a rol, rotate left one bit, the bit coming in is the bit going out of the prior byte and the bit going out goes to the carry bit. I repeat the rol instruction for the other bytes, looking at a 16 bit shift:
00100010 z0001000 - our original number
00100010 z 0001000 - lsl the least significant byte, the ms bit z is in carry
0100010z 00010000 - rotate left the most significant byte pulling the z bit from carry
00100010z0001000 - if it had been a 16 bit register
0100010z00010000 - a logical shift left on a 16 bit with a zero coming in on the left
that is what the rotates are for and that is why the assembly manual bothers to tell you what flags are modified when you perform a logical operation.
I'll do the first one and then maybe you can try and work out the rest using a similar approach:
/** LSL **/
mov r0, #1 ; r0 = 0000 0000 0000 0000 0000 0000 0000 0001
mov r3, r0, LSL#10 ; r3 = r0 logically shifted left by 10 bit positions
= 0000 0000 0000 0000 0000 0100 0000 0000
^ ^
+<<<<<<<<<<<+
shift left 10 bits
Note however that if you don't yet understand boolean operations such as OR (|), AND (&), etc, then you will have a hard time understanding the corresponding ARM instructions (ORR, AND, etc).

Resources