C MSB to LSB explanation

C MSB to LSB explanation - c

Could someone explain to me how this algorithm converts MSB to LSB or LSB to MSB on a 32-bit system?
unsigned char b = x;
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
I've seen hex values end with LU or just U in code before, what do they mean?
Thanks!

Presumably, a char has eight bits, so unsigned char b = x takes the low eight bits of x.
The mask with 0x22110 extracts bits 4, 8, 13, and 17 (numbering from 0 for the least significant bit). So, in the multiplication by 0x0802, we only care about what it places at those bits. In 0x802, bits 1 and 11 are on, so this multiplication places a copy of the eight bits of b in bits 1 through 8 and another copy in bits 11 through 18. There is no overlap, so there are no effects from adding bits that overlap in more general multiplications.
From this product, we take these bits:
Bit 4, which is bit 3 of b. (Bit 4 from the copy starting at bit 1, so bit 4–1 = 3 of b.)
Bit 8, which is bit 7 of b. (8–1 = 7.)
Bit 13, which is bit 2 of b. (13–11 = 2.)
Bit 17, which is bit 6 of b. (17–11 = 6.)
Similarly, the mask by 0x88440 extracts bits 6, 10, 15, and 19. The multiplication by 0x8020 places a copy of b in bits 5 to 12 and another copy in bits 15 to 22. From this product, we take these bits:
Bit 6, which is bit 1 of b.
Bit 10, which is bit 5 of b.
Bit 15, which is bit 0 of b.
Bit 19, which is bit 4 of b.
Then we OR those together, producing:
Bit 4, which is bit 3 of b.
Bit 6, which is bit 1 of b.
Bit 8, which is bit 7 of b.
Bit 10, which is bit 5 of b.
Bit 13, which is bit 2 of b.
Bit 15, which is bit 0 of b.
Bit 17, which is bit 6 of b.
Bit 19, which is bit 4 of b.
Call this result t.
We are going to multiply that by 0x10101, shift right by 16, and assign to b. The assignment converts to unsigned char, so only the low eight bits are kept. The low eight bits after the shift are bits 24 to 31 before the shift. So we only care about bits 24 to 31 in the product.
The multiplier 0x10101 has bits 0, 8, and 16 set. Thus, bit 24 in the result is the sum of bits 24, 16, and 8 in t, plus any carry from elsewhere. However, there is no carry: Observe that none of the set bits in t are eight apart, as the bits in the multiplier are. Therefore, none of them can directly contribute to the same bit in the product. Each bit in the product is the result of at most one bit in t. We just need to figure out which bit that is.
Bit 24 must come from bit 8, 16, or 24 in t. Only bit 8 can be set, and it is bit 7 from b. Deducing all the bits this way:
Bit 24 is bit 8 in t, which is bit 7 in b.
Bit 25 is bit 17 in t, which is bit 6 in b.
Bit 26 is bit 10 in t, which is bit 5 in b.
Bit 27 is bit 19 in t, which is bit 4 in b.
Bit 28 is bit 4 in t, which is bit 3 in b.
Bit 29 is bit 13 in t, which is bit 2 in b.
Bit 30 is bit 6 in t, which is bit 1 in b.
Bit 31 is bit 15 in t, which is bit 0 in b.
Thus, bits 24 to 31 in the product are bits 7 to 0 in b, so the eight bits finally produced are bits 7 to 0 in b.

View b as an 8 bit value abcdefgh where each of those letters is a single bit (0 or 1), with a the most significant bit and h the least significant. Then look at what each of the operations do to those bits:
b * 0x0802LU = 00000abcdefgh00abcdefgh0
b * 0x0802LU & 0x22110LU = 000000b000f0000a000e0000
b * 0x8020LU = 0abcdefgh00abcdefgh00000
b * 0x8020LU & 0x88440LU = 0000d000h0000c000g000000
((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU))
= 0000d0b0h0f00c0a0g0e0000
so at this point, it has shuffled the bits and spread them out.
(....) * 0x10101LU = d0b0h0f00c0a0g0e0000
+ d0b0h0f00c0a0g0e000000000000
+ d0b0h0f00c0a0g0e00000000000000000000
= d0b0f0f0dcbahgfedcbahgfe0c0a0g0e0000
(...) * 0x10101LU >> 16 = d0b0f0f0dcbahgfedcba
b = hgfedcba
the multiply is equivalent to shift/add/add (3 bits set in the constant), which lines up the bits where they should end up. Then the final shift and reduction to 8 bits gives you the final bit-reversed result.

To answer your second question, u means to treat the hex constant as an unsigned (if there is need to expand it to a longer width), and l means to treat it as a long.
I'm working on your first question.

It's difficult to visualize what this algorithm is doing when you look at it as multiplications and hex. It becomes more clear when you convert it to binary and replace the multiplications with an equivalent sum of shift operations. Essentially what it is doing is it is spreading out parts of the byte by shifting and masking it, and then implementing a parallel half-adder that reconstructs the parts in place, which happens to be the reverse of where they started.
For example,
b * 0x0802 = b << 11 | b << 1
Plug in some values (in binary) for b and follow along.

Related

Is the least significant bit (LSB) always the "first" bit?

I'm reading Modern C (version Feb 13, 2018.) and on page 42 it says
It says that the bit with index 4 is the least significant bit. Isn't the bit with index 0 should be the least significant bit? (Same question about MSB.)
Which is right? What's the correct terminology?

Their definition of "most significant bit" and "least significant bit" is misleading:
8 bit Binary number : 1 1 1 1 0 0 0 0
Bit number 7 6 5 4 3 2 1 0
| | |
| | least significant bit
| |
| |
| least significant bit that is 1
|
most significant bit that is 1 and also just most significant bit

The book's definition does not align with common/typical/mainstream/correct usage. See Wikipedia, for instance:
In computing, the least significant bit (LSB) is the bit position in a binary integer giving the units value, that is, determining whether the number is even or odd.
The book, on the other hand, seems to consider only bits that are 1, so that in an 8-bit byte representing the number 16, which we can write:
00010000
the bit that is 1 has index 4 (it's b4 in the book's notation), and then it claims that that particular number's LSB is four.
The proper definition just uses LSB to denote that bit whose value is 1, i.e. the "units", and with that the LSB is the rightmost bit. This latter definition is more useful, and I really think the book is wrong.

They're using an unusual definition of LSB and MSB, which only refers to the bits that are set to 1. So in the case of 240, the first 1 bit is b4, not b0, because b0 through b3 are all 0.
I'm not sure why the book considers this definition of LSB/MSB to be useful. It's not generally interesting for integers, although it does come into play in floating point. Floating point numbers are scaled so integers above 1 have the low-order zero bits shifted away, and the exponent is incremented to make up for this (conversely, fractions have their high-order bits shifted away, and the exponent is decremented).

Understanding the bitwise negation of signed numbers and negative number representation in a typical sysem

Suppose I have a signed char member num = 15 and I do num = ~num. Then, as per this post, we will get -16.
~(00001111) = (11110000)
But, if I consider the MSB as sign bit, shouldn't the answer be -112? How come this is resulting in -16? Why the second and third set bits from left are being ignored.
Can anyone please clarify.
EDIT
I want more detailed explaination of why the following program resulted in -16 and not -112
#include<stdio.h>
int main()
{
char num = 15;
num = ~num;
printf("%d\n", num);
return 0;
}
I expected it as 1(1110000) = -(1110000) = -112

~(00001111) = (11110000)
What you're doing is using the MSb (Most Significant bit) as a flag to decide to put a '-' sign, and then reading the rest of the bits normally. Instead, the MSb is a flag to do 2 things: put a '-' sign - and then NOT the value and add one (2's complement) - before printing out the rest of the bits.
This comes from the overflow/underflow nature of fixed-length bit values:
00000010 - 1 = 00000001 (2-1=1)
00000001 - 1 = 00000000 (1-1=0)
00000000 - 1 = 11111111 (0-1=-1)

C allows for three different representations of signed integers, but the most common is "two's complement". However, I'll briefly discuss "one's complement" as well to illustrate how there is a relationship between them.
One's complement
One's complement integers are split into sign and value bits. To use the 8-bit representation of the integer 19 as an example:
S|64 32 16 8 4 2 1
0| 0 0 1 0 0 1 1 = 19
Using the bitwise complement operator ~ flips all of the bits of the integer, including the sign bit:
S|64 32 16 8 4 2 1
1| 1 1 0 1 1 0 0 = ~19
When the sign bit is set, the interpretation of 1 and 0 bits is reversed (0=on, 1=off), and the value is considered negative. This means the value above is:
-(16 + 2 + 1) = -19
Two's complement
Unlike one's complement, an integer is not divided into a sign bit and value bits. Instead, what is regarded as a sign bit adds -2^(b - 1) to the rest of the value, where b is the number of bits. To use the example of an 8-bit representation of ~19 again:
-128 64 32 16 8 4 2 1
1 1 1 0 1 1 0 0 = ~19
-128 + 64 + 32 + 8 + 4
= -128 + 108
= -(128 - 108)
= -20
The relationship between them
The value of -19 is 1 more than -20 arithmetically, and this follows a generic pattern in which any value of -n in two's complement is always one more than the value of ~n, meaning the following always holds true for a value n:
-n = ~n + 1
~n = -n - 1 = -(n + 1)
This means that you can simply look at the 5-bit value 15, negate it and subtract 1 to get ~15:
~15 = (-(15) - 1)
= -16
-16 for a 5-bit value in two's complement is represented as:
-16 8 4 2 1
1 0 0 0 0 = -16
Flipping the bits using the ~ operator yields the original value 15:
-16 8 4 2 1
0 1 1 1 1 = ~(-16) = -(-16 + 1) = -(-15) = 15
Restrictions
I feel I should mention arithmetic overflow regarding two's complement. I'll use the example of a 2-bit signed integer to illustrate. There are 2^2=4 values for a 2-bit signed integer: -2, -1, 0, and 1. If you attempt to negate -2, it won't work:
-2 1
1 0 = -2
Writing +2 in plain binary yields 1 0, the same as the representation of -2 above. Because of this, +2 is not possible for a 2-bit signed integer. Using the equations above also reveals the same issue:
// Flip the bits to obtain the value of ~(-2)
~(-2) = -(-2 + 1)
~(-2) = 1
// Substitute 1 in place of ~(-2) to find the result of -(-2)
-(-2) = ~(-2) + 1
-(-2) = 1 + 1
-(-2) = 2
While this makes sense mathematically, the fact is that 2 is outside the representable range of values (only -2, -1, 0, and 1 are allowed). That is, adding 1 to 01 (1) results in 10 (-2). There's no way to magically add an extra bit in hardware to yield a new sign bit position, so instead you get an arithmetic overflow.
In more general terms, you cannot negate an integer in which only the sign bit is set with a two's complement representation of signed integers. On the other hand, you cannot even represent a value like -2 in a 2-bit one's complement representation because you only have a sign bit, and the other bit represents the value 1; you can only represent the values -1, -0, +0, and +1 with one's complement.

What does (size + 7) & ~7 mean?

I'm reading the Multiboot2 specification. You can find it here. Compared to the previous version, it names all of its structures "tags". They're defined like this:
3.1.3 General tag structure
Tags constitutes a buffer of structures following each other padded on u_virt size. Every structure has
following format:
+-------------------+
u16 | type |
u16 | flags |
u32 | size |
+-------------------+
type is divided into 2 parts. Lower contains an identifier of
contents of the rest of the tag. size contains the size of tag
including header fields. If bit 0 of flags (also known as
optional) is set if bootloader may ignore this tag if it lacks
relevant support. Tags are terminated by a tag of type 0 and size
8.
Then later in example code:
for (tag = (struct multiboot_tag *) (addr + 8);
tag->type != MULTIBOOT_TAG_TYPE_END;
tag = (struct multiboot_tag *) ((multiboot_uint8_t *) tag
+ ((tag->size + 7) & ~7)))
The last part confuses me. In Multiboot 1, the code was substantially simpler, you could just do multiboot_some_structure * mss = (multiboot_some_structure *) mbi->some_addr and get the members directly, without confusing code like this.
Can somebody explain what ((tag->size + 7) & ~7) means?

As mentioned by chux in his comment, this rounds tag->size up to the nearest multiple of 8.
Let's take a closer look at how that works.
Suppose size is 16:
00010000 // 16 in binary
+00000111 // add 7
--------
00010111 // results in 23
The expression ~7 takes the value 7 and inverts all bits. So:
00010111 // 23 (from pervious step)
&11111000 // bitwise-AND ~7
--------
00010000 // results in 16
Now suppose size is 17:
00010001 // 17 in binary
+00000111 // add 7
--------
00011000 // results in 24
Then:
00011000 // 24 (from pervious step)
&11111000 // bitwise-AND ~7
--------
00011000 // results in 24
So if the lower 3 bits of size are all zero, i.e. a multiple of 8, (size+7)&~7 sets those bits and then clears them, so no net effect. But if any one of those bits is 1, the bit corresponding to 8 gets incremented, then the lower bits are cleared, i.e. the number is rounded up to the nearest multiple of 8.

~ is a bitwise not. & is a bitwise AND
assuming 16 bits are used:
7 is 0000 0000 0000 0111
~7 is 1111 1111 1111 1000
Anything and'd with a 0 is 0. Anything and'd with 1 is itself. Thus
N & 0 = 0
N & 1 = N
So when you AND with ~7, you essentially clear the lowest three bits and all of the other bits remain unchanged.

Thanks for #chux for the answer. According to him, it rounds the size up to a multiple of 8, if needed. This is very similar to a technique done in 15bpp drawing code:
//+7/8 will cause this to round up...
uint32_t vbe_bytes_per_pixel = (vbe_bits_per_pixel + 7) / 8;
Here's the reasoning:
Things were pretty simple up to now but some confusion is introduced
by the 16bpp format. It's actually 15bpp since the default format is
actually RGB 5:5:5 with the top bit of each u_int16 being unused. In
this format, each of the red, green and blue colour components is
represented by a 5 bit number giving 32 different levels of each and
32786 possible different colours in total (true 16bpp would be RGB
5:6:5 where there are 65536 possible colours). No palette is used for
16bpp RGB images - the red, green and blue values in the pixel are
used to define the colours directly.

& ~7 sets the last three bits to 0

Converting 8 bits to a scaled 12 bits equivalent

I need to convert an 8 bit number (0 - 255 or #0 - #FF) to its 12 bit equivalent (0 - 4095 or #0 - #FFF)
I am not wanting to do just a straight conversion of the same number. I am wanting to represent the same scale, but in 12 bits.
For example:-
0xFF in 8 bits should convert to 0xFFF in 12 bits
0x0 in 8 bits should convert to 0x0 in 12 bits
0x7F in 8 bits should convert to 0x7FF in 12 bits
0x24 in 8 bit should convert to 0x249 in 12 bits
Are there any specific algorithms or techniques that I should be using?
I am coding in C

Try x << 4 | x >> 4.
This has been updated by the OP, changed from x << 4 + x >> 4

If you are able to go through a larger domain then this may help:
b = a * ((1 << 12) - 1) / ((1 << 8) - 1)
It is ugly but preserves scaling almost as requested. Of course you can put constants.

What about:
x = x ?((x + 1) << 4) - 1 :0

I use mathematical equation y=mx+c
Assuming low range of values is zero.
You can scale your data by a factor of m (Multiple for increasing range and divide for decreasing)
Ex.
My ADC data was 12 bit. Range in integer =0 to 4095
I want to shrink this data in range 0 to 255.
m=(y2-y1/x2-x1)
m=(4095-0/255-0)
m=16.05 = 16
So data received in 12 bits is divided by 16 to convert to 8 bits.
This conversion is linear in nature.
Hope this is also a good idea.
Image Link

Can anyone explain why '>>2' shift means 'divided by 4' in C codes?

I know and understand the result.
For example:
<br>
7 (decimal) = 00000111 (binary) <br>
and 7 >> 2 = 00000001 (binary) <br>
00000001 (binary) is same as 7 / 4 = 1 <br>
So 7 >> 2 = 7 / 4 <br>
<br>
But I'd like to know how this logic was created.
Can anyone elaborate on this logic?
(Maybe it just popped up in a genius' head?)
And are there any other similar logics like this ?

It didn't "pop-up" in a genius' head. Right shifting binary numbers would divide a number by 2 and left shifting the numbers would multiply it by 2. This is because 10 is 2 in binary. Multiplying a number by 10(be it binary or decimal or hexadecimal) appends a 0 to the number(which is effectively left shifting). Similarly, dividing by 10(or 2) removes a binary digit from the number(effectively right shifting). This is how the logic really works.
There are plenty of such bit-twiddlery(a word I invented a minute ago) in computer world.
http://graphics.stanford.edu/~seander/bithacks.html Here is for the starters.
This is my favorite book: http://www.amazon.com/Hackers-Delight-Edition-Henry-Warren/dp/0321842685/ref=dp_ob_image_bk on bit-twiddlery.

It is actually defined that way in the C standard.
From section 6.5.7:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...]
the value of the result is the integral part of the quotient of E1 / 2E2
On most architectures, x >> 2 is only equal to x / 4 for non-negative numbers. For negative numbers, it usually rounds the opposite direction.
Compilers have always been able to optimize x / 4 into x >> 2. This technique is called "strength reduction", and even the oldest compilers can do this. So there is no benefit to writing x / 4 as x >> 2.

Elaborating on Aniket Inge's answer:
Number: 30710 = 1001100112
How multiply by 10 works in decimal system
10 * (30710)
= 10 * (3*102 + 7*100)
= 3*102+1 + 7*100+1
= 3*103 + 7*101
= 307010
= 30710 << 1
Similarly multiply by 2 in binary,
2 * (1001100112)
= 2 * (1*28 + 1*25 + 1*24 + 1*21 1*20)
= 1*28+1 + 1*25+1 + 1*24+1 + 1*21+1 1*20+1
= 1*29 + 1*26 + 1*25 + 1*22 + 1*21
= 10011001102
= 1001100112 << 1

I think you are confused by the "2" in:
7 >> 2
and are thinking it should divide by 2.
The "2" here means shift the number ("7" in this case) "2" bit positions to the right.
Shifting a number "1"bit position to the right will have the effect of dividing by 2:
8 >> 1 = 4 // In binary: (00001000) >> 1 = (00000100)
and shifting a number "2"bit positions to the right will have the effect of dividing by 4:
8 >> 2 = 2 // In binary: (00001000) >> 2 = (00000010)

Its inherent in the binary number system used in computer.
a similar logic is --- left shifting 'n' times means multiplying by 2^n.

An easy way to see why it works, is to look at the familiar decimal ten-based number system, 050 is fifty, shift it to the right, it becomes 005, five, equivalent to dividing it by 10. The same thing with shifting left, 050 becomes 500, five hundred, equivalent to multiplying it by 10.
All the other numeral systems work the same way.

they do that because shifting is more efficient than actual division. you're just moving all the digits to the right or left, logically multiplying/dividing by 2 per shift
If you're wondering why 7/4 = 1, that's because the rest of the result, (3/4) is truncated off so that it's an interger.

Just my two cents: I did not see any mention to the fact that shifting right does not always produce the same results as dividing by 2. Since right shifting rounds toward negative infinity and integer division rounds to zero, some values (like -1 in two's complement) will just not work as expected when divided.

It's because >> and << operators are shifting the binary data.
Binary value 1000 is the double of binary value 0100
Binary value 0010 is the quarter of binary value 1000

You can call it an idea of a genius mind or just the need of the computer language.
To my belief, a Computer as a device never divides or multiplies numbers, rather it only has a logic of adding or simply shifting the bits from here to there. You can make an algorithm work by telling your computer to multiply, subtract them up, but when the logic reaches for actual processing, your results will be either an outcome of shifting of bits or just adding of bits.
You can simply think that for getting the result of a number being divided by 4, the computer actually right shifts the bits to two places, and gives the result:
7 in 8-bit binary = 00000111
Shift Right 2 places = 00000001 // (Which is for sure equal to Decimal 1)
Further examples:
//-- We can divide 9 by four by Right Shifting 2 places
9 in 8-bit binary = 00001001
Shift right 2 places: 00000010 // (Which is equal to 9/4 or Decimal 2)
A person with deep knowledge of assembly language programming can explain it with more examples. If you want to know the actual sense behind all this, I guess you need to study bit level arithmetic and assembly language of computer.