C - Using bit-shift operators for base conversion - c

I'm trying to convert some data from hex to base64 in C, I found an algorithm online but I would really like to know how it works rather than just implenting it and firing it off. If someone could please explain how the following is working I would appreciate it. I have been reading about the shift operators and I don't seem to understand them as much as I thought I did...it's not quite clicking for me.
for (x = 0; x < dataLength; x += 3)
{
/* these three 8-bit (ASCII) characters become one 24-bit number */
n = data[x] << 16;
if((x+1) < dataLength)
n += data[x+1] << 8;
if((x+2) < dataLength)
n += data[x+2];
/* this 24-bit number gets separated into four 6-bit numbers */
n0 = (uint8_t)(n >> 18) & 63;
n1 = (uint8_t)(n >> 12) & 63;
n2 = (uint8_t)(n >> 6) & 63;
n3 = (uint8_t)n & 63;
This code was taken from Wikibooks, it is NOT mine, I'm just trying to understand the bitshifting and how it's allowing me to convert the data.
Thank you for your help, I really appreciate it.
Source: Base64

First of all, the input data is not hex as you say. It's simply data stored as bytes. The code will give you the base64 representation of it (although the code you posted lacks the part which will map n0, n1, n2, n3 to printable ASCII characters).
Suppose the first three bytes of the input are (in binary representation, each letter represents a 0 or 1):
abcdefgh, ijklmnop, qrstuvwx
The first part of the code will combine them to a single 24-bit number. This is done by shifting the first one 16 bits to the left and the second one 8 bits to the left and adding:
abcdefgh0000000000000000 (abcdefgh << 16)
+ 00000000ijklmnop00000000 (ijklmnop << 8)
0000000000000000qrstuvwx
------------------------
abcdefghijklmnopqrstuvwx
Then it separates this into four 6-bit numbers by shifting and and'ing. For example, the second number is calculated by shifting 12 bits to the right and and'ing with 111111
n = abcdefghijklmnopqrstuvwx
n>>12 = 000000000000abcdefghijkl
63 = 000000000000000000111111
And'ing gives:
000000000000000000ghijkl

Ok here is a bit of explanation..
data[x] is an array of chars, a char is usuall 8bits.. (random 8bits number 01010101)
n is a 32bit number here is a random 32bit number(01011111000011110000111100001111)think there are 32bits there :)
remember n is 32bits and data is only 8bits.. lets go through the first line
n = data[x] << 16;
<<16 has precedence over the equal sign so its evaluated first.
data[x] << 16 means move the bits in memory that data[x] represents by 16bits to the left.
suppose data[x] = 'a' this is represented by 01100001 in memory(1 bytes), so lets move is 16bits to the left
n = 00000000 01100001 00000000 00000000
next we have
if((x+1) < dataLength)
n += data[x+1] << 8;
this says move the next char data[x+1] 8 bits and add it to n; so lets move it 8 bits first
( I assumed it was 'a' again)
00000000 00000000 01100001 00000000
(this is done in some register in your processor)
now lets add it to n
00000000 01100001 01100001 00000000
next part is
if((x+2) < dataLength)
n += data[x+2];
lets do the same thing here, notice there is no bit shifting, since the last 8bits of n are free!! all we need to do is add it to n
b = 01100010 (assumed data[x+2] = 'b')
adding it to n
00000000 01100001 01100001 01100010
great so now we have a 24bits number(actually n is 32bits but the last 24bits is what we need)
next part
n0 = (uint8_t)(n >> 18) & 63;
(take note n0 is only 8bits wide or a single unsigned byte)
take n and move it to the left by 18bits and "and" it with 63
n = 00000000 01100001 01100001 01100010
n moved 18bits to right is 00000000 00000000 00000000 00011000
now n is cast to an unsigned int of 8bits (uint8_t)
so now it becomes 00011000
last part is the & operator(bitwise and)
00011000 &
00111111
n0= 00011000
now repeat this for the rest

Related

Why does this for loop in C program print Hi 11 times?

#include <stdio.h>
int main()
{
int i = 1024;
for (; i; i >>= 1)
printf("Hi");
return 0;
}
Why does the for loop print Hi 11 times? I don't understand.
The expression i >>= 1 is equivalent to i = i / 2
So if initially i is equal to 1024 then the body of the loop will be executed for the values
1024 512 256 128 64 32 16 8 4 2 1
that is 11 times. The value of the expression 1 / 2 is equal to 0 due to the integer arithmetic.
You could check that by changing the call of printf like
#include <stdio.h>
int main( void )
{
int i = 1024;
for (; i; i >>= 1) printf( "%d ", i );
putchar( '\n' );
}
Example of bitwise right shift operator:
int x = 1024
int z = x >>=1 # right shift operator
// then printing z will print 512, i.e. x/2
So, now if you keep on updating the value of z using a for loop as you are updating the value of 'i' in your example you will see that 'i' or 'z' will go as low as 1 (because 'i' is int type)
# include <stdio.h>
# include <iostream>
int main()
{
int i = 1024;
for (; i; i >>= 1)
std::cout << "\n" << i;
return 0;
}
In your code, because i = 1024 (2^10) can be divided maximum of 11 times by 2 hence the loop prints "Hi" 11 times where i = [1024, 512, ....2, 1].
Check this link for more details:
https://en.cppreference.com/w/cpp/language/operator_assignment
The >>= operator shifts the bits of the binary number to the right by the value on its right side, and assigns the result back to the variable. Here i is shifted to the right by 1. The least significant (rightmost) bit is discarded.
The loop condition in the middle of the for is just i, which is equivalent to i != 0 due to non-zero values being considered "truthy" and zero "falsey" in C.
1024 is 0b10000000000 in binary, i.e., a 1 followed by 10 zeroes. It will take 11 shifts to the right until the 1 "falls off" and leaves only zeroes, which causes the loop condition to no longer be true.
(As noted in other answers, shifting to the right by one bit is equivalent to integer division by two since the value of each binary digit further to the left is indeed one power of two greater than the previous (ones, twos, fours, eights, etc.), just like in decimal the values are powers of ten (ones, tens, hundreds, etc.). However, this is not really relevant to the question, since we can remain in the world of bits, as described above. Likewise even though integer division discards the fractional part, that is not why 1 >> 1 is zero…)
#include <stdio.h>
int main()
{
for (int i = 1024; i>0; i >>= 1) {
printf("%d ", i);
}
// proccess above as follows:
// 1024 (decimal) equal with 00000100 00000000 (binary)
//
// 1024 >>= 1 mean rotate 00000100 00000000 to right one bit (put 0 bit in left positin, values will be shift to right one bit each loop)
// 00000100 00000000 = 1024 (STEP 1) > START
// 00000010 00000000 = 512 (STEP 2)
// 00000001 00000000 = 256 (STEP 3)
// 00000000 10000000 = 128 (STEP 4)
// 00000000 01000000 = 64 (STEP 5)
// 00000000 00100000 = 32 (STEP 6)
// 00000000 00010000 = 16 (STEP 7)
// 00000000 00001000 = 8 (STEP 8)
// 00000000 00000100 = 4 (STEP 9)
// 00000000 00000010 = 2 (STEP 10)
// 00000000 00000001 = 1 (STEP 11) > FINISH
}

what is the meaning of k-=(k & (-k)) in c? [duplicate]

This question already has answers here:
Why the bit operation i & (-i) equals to rightmost bit?
(3 answers)
Closed 4 years ago.
A function to calculate sum where I encountered with this statement ..plz help
int get_sum(int x) {
int p = 0, k;
for (k = x; k > 0; k -= k & -k)
p += bit[k];
return p;
}
This expression:
k -= (k & (-k))
Is a tricky way of taking the least significant bit that is set in a positive number and clearing that bit. It is dependent on two's compliment representation of negative numbers.
The first part, k & (-k) isolates the least significant bit that is set. For example:
1 & -1:
00000001
& 11111111
--------
00000001
2 & -2:
00000010
& 11111110
--------
00000010
24 & -24:
00011000
& 11101000
--------
00001000
When this value is subtracted from the orignal k, it clears that bit as a result.
So as the loop progresses, the value of k is reduced 1 bit at a time, starting with the lowest. So if for example x was 52, k would be 52, then 48 (52 - 4), then 32 (48 - 16), and would exit at 0 (32 - 32).
As to why the program is doing this, that depends entirely on the definition of bit and what it stores.

Bit Shifting - Finding nth byte in a number [duplicate]

I know you can get the first byte by using
int x = number & ((1<<8)-1);
or
int x = number & 0xFF;
But I don't know how to get the nth byte of an integer.
For example, 1234 is 00000000 00000000 00000100 11010010 as 32bit integer
How can I get all of those bytes? first one would be 210, second would be 4 and the last two would be 0.
int x = (number >> (8*n)) & 0xff;
where n is 0 for the first byte, 1 for the second byte, etc.
For the (n+1)th byte in whatever order they appear in memory (which is also least- to most- significant on little-endian machines like x86):
int x = ((unsigned char *)(&number))[n];
For the (n+1)th byte from least to most significant on big-endian machines:
int x = ((unsigned char *)(&number))[sizeof(int) - 1 - n];
For the (n+1)th byte from least to most significant (any endian):
int x = ((unsigned int)number >> (n << 3)) & 0xff;
Of course, these all assume that n < sizeof(int), and that number is an int.
int nth = (number >> (n * 8)) & 0xFF;
Carry it into the lowest byte and take it in the "familiar" manner.
If you are wanting a byte, wouldn't the better solution be:
byte x = (byte)(number >> (8 * n));
This way, you are returning and dealing with a byte instead of an int, so we are using less memory, and we don't have to do the binary and operation & 0xff just to mask the result down to a byte. I also saw that the person asking the question used an int in their example, but that doesn't make it right.
I know this question was asked a long time ago, but I just ran into this problem, and I think that this is a better solution regardless.
//was trying to do inplace, would have been better if I had swapped higher and lower bytes somehow
uint32_t reverseBytes(uint32_t value) {
uint32_t temp;
size_t size=sizeof(uint32_t);
for(int i=0; i<size/2; i++){
//get byte i
temp = (value >> (8*i)) & 0xff;
//put higher in lower byte
value = ((value & (~(0xff << (8*i)))) | (value & ((0xff << (8*(size-i-1)))))>>(8*(size-2*i-1))) ;
//move lower byte which was stored in temp to higher byte
value=((value & (~(0xff << (8*(size-i-1)))))|(temp << (8*(size-i-1))));
}
return value;
}

How do I perform a circular rotation of a byte?

I'm trying to implement a function that performs a circular rotation of a byte to the left and to the right.
I wrote the same code for both operations. For example, if you are rotating left 1010 becomes 0101. Is this right?
unsigned char rotl(unsigned char c) {
int w;
unsigned char s = c;
for (w = 7; w >= 0; w--) {
int b = (int)getBit(c, w);//
if (b == 0) {
s = clearBit(s, 7 - w);
} else if (b == 1) {
s = setBit(s, 7 - w);
}
}
return s;
}
unsigned char getBit(unsigned char c, int n) {
return c = (c & (1 << n)) >> n;
}
unsigned char setBit(unsigned char c, int n) {
return c = c | (1 << n);
}
unsigned char clearBit(unsigned char c, int n) {
return c = c &(~(1 << n));
}
There is no rotation operator in C, but if you write:
unsigned char rotl(unsigned char c)
{
return (c << 1) | (c >> 7);
}
then, according to this: http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf (page 56), compilers will figure out what you want to do and perform the rotation it in only one (very fast) instruction.
Reading the answers and comments so far, there seems to be some confusion about what you are trying to accomplish - this may be because of the words you use. In bit manipulation, there are several "standard" things you can do. I will summarize some of these to help clarify different concepts. In all that follows, I will use abcdefgh to denote 8 bits (could be ones or zeros) - and as they move around, the same letter will refer to the same bit (maybe in a different position); if a bit becomes "definitely 0 or 1, I will denote it as such).
1) Bit shifting: This is essentially a "fast multiply or divide by a power of 2". The symbol used is << for "left shift" (multiply) or >> for right shift (divide). Thus
abcdefgh >> 2 = 00abcdef
(equivalent to "divide by four") and
abcdefgh << 3 = abcdefgh000
(equivalent to "multiply by eight" - and assuming there was "space" to shift the abc into; otherwise this might result in an overflow)
2) Bit masking: sometimes you want to set certain bits to zero. You do this by doing an AND operation with a number that has ones where you want to preserve a bit, and zeros where you want to clear a bit.
abcdefgh & 01011010 = 0b0de0g0
Or if you want to make sure certain bits are one, you use the OR operation:
abcdefgh | 01011010 = a1c11f1h
3) Circular shift: this is a bit trickier - there are instances where you want to "move bits around", with the ones that "fall off at one end" re-appearing at the other end. There is no symbol for this in C, and no "quick instruction" (although most processors have a built-in instruction which assembler code can take advantage of for FFT calculations and such). If you want to do a "left circular shift" by three positions:
circshift(abcdefgh, 3) = defghabc
(note: there is no circshift function in the standard C libraries, although it exists in other languages - e.g. Matlab). By the same token a "right shift" would be
circshift(abcdefgh, -2) = ghabcdef
4) Bit reversal: Sometimes you need to reverse the bits in a number. When reversing the bits, there is no "left" or "right" - reversed is reversed:
reverse(abcdefgh) = hgfedcba
Again, there isn't actually a "reverse" function in standard C libraries.
Now, let's take a look at some tricks for implementing these last two functions (circshift and reverse) in C. There are entire websites devoted to "clever ways to manipulate bits" - see for example this excellent one. for a wonderful collection of "bit hacks", although some of these may be a little advanced...
unsigned char circshift(unsigned char x, int n) {
return (x << n) | (x >> (8 - n));
}
This uses two tricks from the above: shifting bits, and using the OR operation to set bits to specific values. Let's look at how it works, for n = 3 (note - I am ignoring bits above the 8th bit since the return type of the function is unsigned char):
(abcdefgh << 3) = defgh000
(abcdefgh >> (8 - 3)) = 00000abc
Taking the bitwise OR of these two gives
defgh000 | 00000abc = defghabc
Which is exactly the result we wanted. Note also that a << n is the same as a >> (-n); in other words, right shifting by a negative number is the same as left shifting by a positive number, and vice versa.
Now let's look at the reverse function. There are "fast ways" and "slow ways" to do this. Your code above gave a "very slow" way - let me show you a "very fast" way, assuming that your compiler allows the use of 64 bit (long long) integers.
unsigned char reverse(unsigned char b) {
return (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
}
You may ask yourself "what just happened"??? Let me show you:
b = abcdefgh
* 0x0000000202020202 = 00000000 00000000 0000000a bcdefgha bcdefgha bcdefgha bcdefgha bcdefgh0
& 0x0000010884422010 = 00000000 00000000 00000001 00001000 10000100 01000010 00100000 00010000
= 00000000 00000000 0000000a 0000f000 b0000g00 0c0000h0 00d00000 000e0000
Note that we now have all the bits exactly once - they are just in a rather strange pattern. The modulo 1023 division "collapses" the bits of interest on top of each other - it's like magic, and I can't explain it. The result is indeed
hgfedcba
A slightly less obscure way to achieve the same thing (less efficient, but works for larger numbers quite efficiently) recognizes that if you swap adjacent bits , then adjacent bit pairs, then adjacent nibbles (4 bit groups), etc - you end up with a complete bit reversal. In that case, a byte reversal becomes
unsigned char bytereverse(unsigned char b) {
b = (b & 0x55) << 1 | (b & 0xAA) >> 1; // swap adjacent bits
b = (b & 0x33) << 2 | (b & 0xCC) >> 2; // swap adjacent pairs
b = (b & 0x0F) << 4 | (b & 0xF0) >> 4; // swap nibbles
return b;
}
In this case the following happens to byte b = abcdefgh:
b & 0x55 = abcdefgh & 01010101 = 0b0d0f0h << 1 = b0d0f0h0
b & 0xAA = abcdefgh & 10101010 = a0c0e0g0 >> 1 = 0a0c0e0g
OR these two to get badcfehg
Next line:
b & 0x33 = badcfehg & 00110011 = 00dc00hg << 2 = dc00hg00
b & 0xCC = badcfehg & 11001100 = ba00fe00 >> 2 = 00ba00fe
OR these to get dcbahgfe
last line:
b & 0x0F = dcbahgfe & 00001111 = 0000hgfe << 4 = hgfe0000
b & 0xF0 = dcbahgfe & 11110000 = dcba0000 >> 4 = 0000dcba
OR these to get hgfedcba
Which is the reversed byte you were after. It should be easy to see how just a couple more lines (similar to the above) get you to a reversed integer (32 bits). As the size of the number increases, this trick becomes more and more efficient, comparatively.
I trust that the answer you were looking for is "somewhere" in the above. If nothing else I hope you have a clearer understanding of the possibilities of bit manipulation in C.
If, as according to your comments, you want to shift one bit exactly, then one easy way to accomplish that would be this:
unsigned char rotl(unsigned char c)
{
return((c << 1) | (c >> 7));
}
What your code does is reversing the bits; not rotating them. For instance, it would make 10111001 into 10011101, not 01110011.

Setting invidual bits in byte by group of bits

For example:
We have a byte A: XXXX XXXX
We have a byte B: 0000 0110
And now for example we want 4 bits from byte B on specific position and we want to put inside byte A on specific position like so we have a result:
We have a byte A: 0110 XXXX
Im still searching through magic functions without success.
Found similar and reworking it but still have no endgame with it:
unsigned int i, j; // positions of bit sequences to swap
unsigned int n; // number of consecutive bits in each sequence
unsigned int b; // bits to swap reside in b
unsigned int r; // bit-swapped result goes here
unsigned int x = ((b >> i) ^ (b >> j)) & ((1U << n) - 1); // XOR temporary
r = b ^ ((x << i) | (x << j));
As an example of swapping ranges of bits suppose we have have b = 00101111 (expressed in binary) and we want to swap the n = 3 consecutive bits starting at i = 1 (the second bit from the right) with the 3 consecutive bits starting at j = 5; the result would be r = 11100011 (binary).
This method of swapping is similar to the general purpose XOR swap trick, but intended for operating on individual bits. The variable x stores the result of XORing the pairs of bit values we want to swap, and then the bits are set to the result of themselves XORed with x. Of course, the result is undefined if the sequences overlap.
It's hard to understand your requirenments exactly, so correct me if I'm wrong:
You want to take the last 4 bits of a byte (B) and add them to the first for bits of byte A? You use the term 'put inside' but it's unclear what you mean exactly by it (If not adding, do you mean replace?).
So assuming addition is what you want you could do something like this:
A = A | (B <<4)
This will shift by 4 bits to the left (thereby ending up with 01100000) and then 'adding ' it to A (using or).
byte A: YYYY XXXX
byte B: 0000 0110
and you want 0110 XXXX
so AND A with 00001111 then copy the last 4 bits of B (first shift then OR)
a &= 0x0F; //now a is XXXX
a |= (b << 4); //shift B to 01100000 then OR to get your result
if you wanted 0110 YYYY just shift a by 4 to the right instead of AND
a >>= 4
Found an solution :
x = ((b>>i)^(r>>j)) & ((1U << n) -1)
r = r^(x << j)
where r is the 2nd BYTE, i,j are indexes in order (from,to).

Resources