Hex to Octal Conversion Program Without Using Decimal or Binary - c

Today I was just playing around for basic conversions from one base to another. I goggled some code for converting from hex to octal, and I noticed that it mostly uses intermediate conversion to either decimal or binary and then back to octal.Is it possible write my own function for converting hex string to octal string without using any intermediate conversion.Also I do not want to use inbuilt printf option like %x or %o. Thanks for your inputs.

Of course it is possible. A number is a number no matter what numeric system it is in. The only problem is that people are used to decimal and that is why they understand it better. You may convert from any base to any other.
EDIT: more info on how to perform the conversion.
First note that 3 hexadecimal digits map to exactly 4 octal digits. So having the number of hexadecimal digits you may find the number of octal digits easily:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int get_val(char hex_digit) {
if (hex_digit >= '0' && hex_digit <= '9') {
return hex_digit - '0';
} else {
return hex_digit - 'A' + 10;
}
}
void convert_to_oct(const char* hex, char** res) {
int hex_len = strlen(hex);
int oct_len = (hex_len/3) * 4;
int i;
// One hex digit left that is 4 bits or 2 oct digits.
if (hex_len%3 == 1) {
oct_len += 2;
} else if (hex_len%3 == 2) { // 2 hex digits map to 3 oct digits
oct_len += 3;
}
(*res) = malloc((oct_len+1) * sizeof(char));
(*res)[oct_len] = 0; // don't forget the terminating char.
int oct_index = oct_len - 1; // position we are changing in the oct representation.
for (i = hex_len - 1; i - 3 >= 0; i -= 3) {
(*res)[oct_index] = get_val(hex[i]) % 8 + '0';
(*res)[oct_index - 1] = (get_val(hex[i])/8+ (get_val(hex[i-1])%4) * 2) + '0';
(*res)[oct_index - 2] = get_val(hex[i-1])/4 + (get_val(hex[i-2])%2)*4 + '0';
(*res)[oct_index - 3] = get_val(hex[i-2])/2 + '0';
oct_index -= 4;
}
// if hex_len is not divisible by 4 we have to take care of the extra digits:
if (hex_len%3 == 1) {
(*res)[oct_index] = get_val(hex[0])%8 + '0';
(*res)[oct_index - 1] = get_val(hex[0])/8 + '0';
} else if (hex_len%3 == 2) {
(*res)[oct_index] = get_val(hex[1])%8 + '0';
(*res)[oct_index - 1] = get_val(hex[1])/8 + (get_val(hex[0])%4)*4 + '0';
(*res)[oct_index - 2] = get_val(hex[0])/4 + '0';
}
}
Also here is the example on ideone so that you can play with it: example.

It's a little tricky as you will be converting groups of 4 bits to groups of 3 bits - you'll probably want to work with 12 bits at a time, i.e. 3 hex digits to 4 octal digits and you'll then have to deal with any remaining bits separately.
E.g. to convert 5274 octal to hex:
5 2 7 4
101 010 111 100
|||/ \\// \|||
1010 1011 1100
A B C

All numbers in computer's memory are base 2. So whenever you want to actually DO something with the values (mathematical operations), you'll need them as ints, floats, etc. So it's handy or may come handy in the future to do the conversion via computable types.
I'd avoid direct string to string conversions, unless the values can be too big to fit into a numeric variable. It is surprisingly hard to write reliable converter from scratch.
(Using base 10 makes very little sense in a binary computer.)

Yes, you can do it relatively easily: four octal digits always convert to three hex digits, so you can split your string into groups of three hex digits, and process each group from the back. If you do not have enough hex digits to complete a group of three, add leading zeros.
Each hex digit gives you four bits; take the last three, and convert them to octal. Add the next four, and take three more bits to octal. Add the last group of four - now you have six bits in total, so convert them to two octal digits.
This avoids converting the entire number to a binary, although there will be a "sliding" binary window used in the process of converting the number.
Consider an example: converting 62ABC to octal. Divide into groups of three digits: 062 and ABC (note the added zero in front of 62 to make a group of three digits).
Start from the back:
C, or 1100, gets chopped into 1 and 100, making octal 4, and 1 extra bit for the next step
B, or 1011, gets chopped into 10 for the next step and 11 for this step. The 1 from the previous step is attached on the right of 11, making an octal 7
A, or 1010, gets chopped into 101 and 0. The 10 from the previous step is attached on the right, making 010, or octal 2. The 101 is octal 5, so we have 5274 so far.
2 becomes 2 and 0 for the next step;
6 becomes 4 and 01 for the next step;
0 becomes 0 and 1 (because 01 from the previous step is added).
The final result is 01425274.

Seems like a pretty straight forward task to me... You want a hex string and you want to convert it to an octal string. Let's take the ASCII hex and convert it to an int type to work with:
char hex_value[] = "0x123";
int value = strtol(hex_value,NULL,16);
It's still hex at this point, then if we want to convert from one base to another there's simple math that can be done:
123 / 8 = 24 R 3
24 / 8 = 4 R 4
4 / 8 = 0 R 4
This tells us that 12316 == 4438 so all we have to do is write that math into a basic function and put the final value back into a string:
char * convert_to_oct(int hex)
{
int ret = 0, quotient = 0, reminder = 0, dividend = hex, counter = 0, i;
char * ret_str; // returned string
while(dividend > 0){ // while we have something to divide
quotient = dividend / 0x8; // get the quotient
reminder = dividend - quotient * 0x8; // get the reminder
ret += reminder * pow(10, counter); // add the reminder (shifted)
// into our return value
counter++; // increment our shift
dividend = quotient; // get ready for the next divide operation
}
ret_str = malloc(counter); // allocate the right number of characters
sprintf(ret_str, "%d", ret); // store the result
return ret_str;
}
So this function will convert a hex (int) value into a oct string. You could call it like:
int main()
{
char hex_value[] = "0x123";
char * oct_value;
int value = strtol(hex_value,NULL,16);
// sanity check, see what the value should be before the convert
printf("value is %x, auto convert via printf gives %o\n", value, value);
oct_value = convert_to_oct(value);
printf("value is %s\n", oct_value);

All octal digits contain 3 bits of information. All Hex digits contain 12 bits of information. The least common multiple of 3 and 4 is 12.
This means you can build a simple lookup table
0000 = 0x000
0001 = 0x001
0002 = 0x002
...
0007 = 0x007
0010 = 0x008
0011 = 0x009
0012 = 0x00A
...
0017 = 0x00F
0020 = 0x010
...
5274 = 0xABC
...
Now that the idea is there, you have several choices:
Build a Map (lookup table)
The routine here would add leading zeros to the octal (string) number until it was 4 digits long, and then lookup the hexadecimal value from the table. Two variations are typing out the table statically, or populating it dynamically.
Use math to replace the lookup table
Instead of typing out each solution, you could calculate them
hexdigit1 = 01 & octaldigit8 + octaltdigit1;
hexdigit16 = 03 & octaldigit64 << 02 + 06 & octaldigit8 >> 01;
hexdigit256 = octaldigit512 << 01 + 01 & octaldigit64;
where the octaldigit1 / hexdigit16 / octaldigit8 means "octal 1's place", "hexadecimal 16's place", "octal 8's place" respectively.
Note that in either of these cases you don't "use binary" or "use decimal" but as these numbers can be represented in either of those two systems, it's not possible to avoid someone coming along behind and analyzing the correctness of the (or any) solution in decimal or binary terms.

Here is an easy function to convert your characters into javascript. valid for ALERT or for your pages up to 65536 32BITS. The concern you encounter is often for the text for codes beyond 127. The safest value is the OCTAL. ParseXX to avoid.
Thank you for your likes (^ _ ^). it's free to enjoy.
function enjoyOCTALJS (valuestr){
var arrstr = valuestr.split('');
arrstr = arrstr.map(f => (!isNaN(f.charCodeAt(0)))? (f.charCodeAt(0)>127)? '\\'+f.charCodeAt(0).toString(8):f:f);
return arrstr.join('');
}
If you just want to get the octal value of a character do this: Max = 65536 ou 0XFFFF.
var mchar = "à";
var result = mchar.charCodeAt(0).toString(8);
Or completely :
var mchar = 'à';
var result = mchar.codePointAt(0).toString(8);
If value > 65536 return UNDEFINED. You can use the function parameter to increase the value. charCodeAt(x) or codePointAt(x).
Your computer considers everything as 0 to 255.
We do not need to do big functions to convert characters, it's very easy.
CHAR TO UNICIDE
var mchar = 'à';
var result = mchar.codePointAt(0); or mchar.charCodeAt();
UNICODE TO OCTAL :
var mcode = 220;
var result = mcode.toString(8);
etc... :)

Related

How to move uint32_t number to char[]?

I have to copy uint32_t number into the middle of the char[] buffer.
The situation is like this:
char buf[100];
uint8_t pos = 52; // position in buffer to which I want to copy my uint32_t number
uint32_t seconds = 23456; // the actual number
I tried to use memcpy like this:
memcpy(&buf[position], &seconds, sizeof(seconds));
But in buffer I'm getting some strange characters, not the number i want
I also tried using byte-shifting
int shiftby = 32;
for (int i = 0; i < 8; i++)
{
buf[position++] = (seconds >> (shiftby -= 4)) & 0xF;
}
Is there any other option how to solve this problem?
What you're doing in your memcpy code is to put the value 23456 in buff, starting at byte 52 (so bytes 52-55, since the size of seconds is 4 bytes). What you want to do (if I understand you correctly) is to put the string "23456" in buff, starting at byte 52. In this second case, each character takes one byte, and each byte would hold the ASCII value of its character.
Probably the best way to do that is to use snprintf:
int snprintf(char *buffer, size_t n, const char *format-string,
argument-list);
In your example:
snprintf(&buff[position], 5, "%d", seconds)
Note that the n arguments holds the number of digits, rather than the size of the variable. As I said - you take one byte per digit/character.
Obviously you should calculate the number of digits in seconds rather than hard-code it if it can change, and you should also check the return value of snprintf to see if the operation was performed successfully
It is unclear how you are intending to represent this uint32_t, but your code fragment suggest that you are expecting hexadecimal (or perhaps BCD). In that case:
for( int shiftby = 28; shiftby >= 0 ; shiftby -= 4 )
{
char hexdigit = (seconds >> shiftby) & 0xF ;
buf[position++] = hexdigit < 10 ? hexdigit + '0' : hexdigit + 'A' - 10 ;
}
Note that the only real difference between this and your code is the conversion to hex-digit characters by adding conditionally either '0' or 'A' - 10. The use of shiftby as the loop control variable is just a simplification or your algorithm.
The issue with your code is that it inserted integer values 0 to 15 into buf and the characters associated with these values are all ASCII control characters, nominally non-printing. How or whether they render as a glyph on any particular display depends on what you are using to present them. In Windows console for example, printing characters 0 to 15 results in the following:
00 = <no glyph>
01 = '☺'
02 = '☻'
03 = '♥'
04 = '♦'
05 = '♣'
06 = '♠'
07 = <bell> (emits a sound, no glyph)
08 = <backspace>
09 = <tab>
10 = <linefeed>
11 = '♂'
12 = '♀'
13 = <carriage return>
14 = '♫'
15 = '☼'
The change above transforms the values 0 to 15 to ASCII '0'-'9' or 'A'-'F'.
If a hexadecimal presentation is not what you were intending then you need to clarify the question.
Note that if the encoding is BCD (Binary Coded Decimal) where each decimal digit is coded into a 4 bit nibble, then the conversion can be simplified because the range of values is reduced to 0 to 9:
char bcddigit = (seconds >> shiftby) & 0xF ;
buf[position++] = bcddigit + '0' ;
but the hex conversion will work for BCD also.

Get bits from number string

If I have a number string (char array), one digit is one char, resulting in that the space for a four digit number is 5 bytes, including the null termination.
unsigned char num[] ="1024";
printf("%d", sizeof(num)); // 5
However, 1024 can be written as
unsigned char binaryNum[2];
binaryNum[0] = 0b00000100;
binaryNum[1] = 0b00000000;
How can the conversion from string to binary be made effectively?
In my program i would work with ≈30 digit numbers, so the space gain would be big.
My goal is to create datapackets to be sent over UDP/TCP.
I would prefer not to use libraries for this task, since the available space the code can take up is small.
EDIT:
Thanks for quick response.
char num = 0b0000 0100 // "4"
--------------------------
char num = 0b0001 1000 // "24"
-----------------------------
char num[2];
num[0] = 0b00000100;
num[1] = 0b00000000;
// num now contains 1024
I would need ≈ 10 bytes to contain my number in binary form. So, if I as suggested parse the digits one by one, starting from the back, how would that build up to the final big binary number?
In general, converting a number in string representation to decimal is easy because each character can be parsed separately. E.g. to convert "1024" to 1024 you can just look at the '4', convert it to 4, multiply by 10, then convert the 2 and add it, multiply by 10, and so on until you have parsed the whole string.
For binary it is not so easy, e.g. you can convert 4 to 100 and 2 to 010 but 42 is not 100 010 or 110 or something like that. So, your best bet is to convert the whole thing to a number and then convert that number to binary using mathematical operations (bit shifts and such). This will work fine for numbers that fit in one of the C++ number types, but if you want to handle arbitrarily large numbers you will need a BigInteger class which seems to be a problem for you since the code has to be small.
From your question I gather that you want to compress the string representation in order to transmit the number over a network, so I am offering a solution that does not strictly convert to binary but will still use fewer bytes than the string representation and is easy to use. It is based on the fact that you can store a number 0..9 in 4 bits, and so you can fit two of those numbers in a byte. Hence you can store an n-digit number in n/2 bytes. The algorithm could be as follows:
Take the last character, '4'
Subtract '0' to get 4 (i.e. an int with value 4).
Strip the last character.
Repeat to get 0
Concatenate into a single byte: digits[0] = (4 << 4) + 0.
Do the same for the next two numbers: digits[1] = (2 << 4) + 1.
Your representation in memory will now look like
4 0 2 1
0100 0000 0010 0001
digits[0] digits[1]
i.e.
digits = { 64, 33 }
This is not quite the binary representation of 1024, but it is shorter and it allows you to easily recover the original number by reversing the algorithm.
You even have 5 values left that you don't use for storing digits (i.e. everything larger than 1010) which you can use for other things like storing the sign, decimal point, byte order or end-of-number delimiter).
I trust that you will be able to implement this, should you choose to use it.
If I understand your question correctly, you would want to do this:
Convert your string representation into an integer.
Convert the integer into binary representation.
For step 1:
You could loop through the string
Subtract '0' from the char
Multiply by 10^n (depending on the position) and add to a sum.
For step 2 (for int x), in general:
x%2 gives you the least-significant-bit (LSB).
x /= 2 "removes" the LSB.
For example, take x = 6.
x%2 = 0 (LSB), x /= 2 -> x becomes 3
x%2 = 1, x /= 2 -> x becomes 1
x%2 = 1 (MSB), x /= 2 -> x becomes 0.
So we we see that (6)decimal == (110)bin.
On to the implementation (for N=2, where N is maximum number of bytes):
int x = 1024;
int n=-1, p=0, p_=0, i=0, ex=1; //you can use smaller types of int for this if you are strict on memory usage
unsigned char num[N] = {0};
for (p=0; p<(N*8); p++,p_++) {
if (p%8 == 0) { n++; p_=0; } //for every 8bits, 1) store the new result in the next element in the array. 2) reset the placing (start at 2^0 again).
for (i=0; i<p_; i++) ex *= 2; //ex = pow(2,p_); without using math.h library
num[n] += ex * (x%2); //add (2^p_ x LSB) to num[n]
x /= 2; // "remove" the last bit to check for the next.
ex = 1; // reset the exponent
}
We can check the result for x = 1024:
for (i=0; i<N; i++)
printf("num[%d] = %d\n", i, num[i]); //num[0] = 0 (0b00000000), num[1] = 4 (0b00000100)
To convert a up-to 30 digit decimal number, represented as a string, into a serious of bytes, effectively a base-256 representation, takes up to 13 bytes. (ceiling of 30/log10(256))
Simple algorithm
dest = 0
for each digit of the string (starting with most significant)
dest *= 10
dest += digit
As C code
#define STR_DEC_TO_BIN_N 13
unsigned char *str_dec_to_bin(unsigned char dest[STR_DEC_TO_BIN_N], const char *src) {
// dest[] = 0
memset(dest, 0, STR_DEC_TO_BIN_N);
// for each digit ...
while (isdigit((unsigned char) *src)) {
// dest[] = 10*dest[] + *src
// with dest[0] as the most significant digit
int sum = *src - '0';
for (int i = STR_DEC_TO_BIN_N - 1; i >= 0; i--) {
sum += dest[i]*10;
dest[i] = sum % 256;
sum /= 256;
}
// If sum is non-zero, it means dest[] overflowed
if (sum) {
return NULL;
}
}
// If stopped on something other than the null character ....
if (*src) {
return NULL;
}
return dest;
}

divide uint64_t data containing N decimal digits into two uint32_t and trim the output in C

I need to divide two parts into high 8 decimal digits , low 8 decimal digit
I will not write all the details of the code. (e.g. memory allocation etc)
focusing for N decimal digits as input with data type uint64_t* and divide into two 8 decimal digits for first 8 digits, next 8 digits (data type uint32_t), so if there are more digitis, it will be just trimmed.
(Please understand I am a newbie. so if there are wrong in the code, appreciate explanation, only criticism without explanation will not help the newbies)
The code expected to work like this:
Assumption: uint8_t* data_64b as input contains already N decimal digits.
e.g.1)
1234567803456789 as input (16 decimal digits) in uint64_t*
data_64bits
Then , the output will be 12345678, or 03456789 based on the id.
e.g.2)
the input data has 17 decimal digits: 98765432155555553
Then I assume that the last 1 digit '3' should be thrown out.
And output will be each 8 digits either 98765432 or 15555555
based on id
The code:
// #Input: unit8_t id input identifier
// #input: uint64_t* data_64bits input parameter , assume it contains N decimal digits number
// #output: uint32_t* data_32bits output parameter to return the divided each 8 decimal digits
dataFrom64To32bits( uint8_t id, uint64_t* data_64bits, uint32_t* data_32bits )
{
uint8_t* data_64bits; /* 64bits */
#if 0
/* [A] My original thoughts, but I have told this is not necessary.
because the data contains decimal digits
[Q1] really don't need this? */
uint32_t 4BytesLSB;
uint32_t 4BytesMSB;
4BytesLSB = data_64bits & 0xFFFFFFFF; /* obtains the 4 LSB */
4BytesMSB = (data_64bits & 0xFFFFFFFF00000000) >> 32; /* obtains the 4 MSB */
#endif
#if 1
/* [B] I have got advice like this,
and have told that it will be enough doing like this
for 8 decimal digits for high part, low part to divide.
[Q2] Does this really make sense? and will give the result? */
if(id == id_highpart)
{
*data_32bits = ( *data_64bits / 100000000 ) % 100000000;
}
else if(id == id_lowpart)
{
*data_32bits = *data_64bits % 100000000;
}
#endif
}
Dose it make sense?
A part told that is not necessary.
B part adviced for the expected result
Please look at [Q1], [Q2] in the code.
Additional,
which is the right for throwing out the rest from the certain part in the code? trim or something else?
Thanks in advance
Additional, which is the right for throwing out the rest from the
certain part in the code? trim or something else?
To discard digits after the 16th, divide by 10 until there are no more (before extracting the parts):
while (*data_64bits > 9999999999999999ll) *data_64bits /= 10;

C: Finding closest number to the first without repeating digits from the second number?

I read two numbers, both int.
What I need to do is do print the number higher than but closest to the first number, such as 378, but which doesn't contain any of the digits from the second number, for example, 78.
Input: 378 78, output: 390 because that's the lowest number above 378 that doesn't contain any of the digits of 78.
Input: 3454 54, output: 3600 because 3600 is the first closest that doesn't contain 5 or 4, the digits of 54.
I am trying to do this by getting the latest modulus digits of the first number, from the length of the second. For example:
378 78
378 mod 100 == 78, then compare 78 and 78 digits, and if there is same digit move on to 379, then check for 379 mod 100 == 79. When comparing 79 and 78, 7 is the same digit.
And so on until we get 390 for example. This should work for all N-size numbers.
Here is what I've done so far...and that's almost nothing.
#include <stdio.h>
int main()
{
int number1, number2;
int closest = number1 + 1;
scanf("%d %d",&number1,&number2);
int count_modulus = 1;
while(number2)
{
count_modulus = count_modulus * 10;
number2 = number2 / 10;
}
int mod = count_modulus;
int n2 = number2;
while(number1)
{
int remain = number1 % mod;
while(remain)
{
if((remain % 10 == number2 % 10))
{
}
else
{
}
}
}
printf("%d",closest);
return 0;
}
I'm not entirely convinced that your modulo method is going to work since, if you start with 7823 and 78, then 7823 mod 100 gives you 23 which has no digits in common with 78 even though 7823 does.
But, even if I've misunderstood the specification and it did work, I think there's a better way. First things first, how to score a number based on what digits it contains.
Provided your integers have at least ten bits (and they will, since the standard mandates a range that requires sixteen), you can just use a bit mask with each bit representing whether the number holds a digit. We can use a normal int for this since ten bits will get us nowhere near the sign bit that may cause us issues.
The code for scoring a number is:
// Return a bitmask with the bottom ten bits populated,
// based on whether the input number has a given digit.
// So, 64096 as input will give you the binary value
// 000000 0000111010
// <-unused ^ ^ ^ ^
// 9876543210 (digits)
int getMask (int val) {
int mask = 0; // All start at 0
while (val > 0) { // While more digits
mask = mask | (1 << (val % 10)); // Set bit of digit
val = val / 10; // Move to next digit
}
return mask;
}
The "tricky" bit there is the statement:
mask = mask | (1 << (val % 10));
What it does it get the last digit of the number, with val % 10 giving you the remainder when dividing val by ten. So 123 % 10 gives 3, 314159 % 10 gives 9 and so on.
The next step is to shift the binary 1 that many bits to the left, with 1 << (val % 10). Shifting a 1-bit four places left would give you the binary value 10000 so this is simply a way to get the 1-bit in the correct position.
Finally, you bitwise-OR the mask with that calculated value, effectively setting the equivalent bit in the mask, keeping in mind that the bits a | b gives you 1 if either or both a and b are 1.
Or you can check out one of my other answers here for more detail.
So, I hear you ask, how does that bitmask help us to find numbers with no common digits. Well, that where the AND bitwise operator & comes in - this only gives you a 1 bit if both input bits are 1 (again, see the link provided earlier for more detail on the bitwise operators).
Once you have the bitmask for your two numbers, you can use & with them as per the following example, where there are no common digits:
Number Bitmask
9876543210
------ ----------------
314159 000000 1000111010
720 000000 0010000101
----------------- AND(&)
000000 0000000000
You can see that, unless the bit position has a 1 for both numbers, the resultant bit will be 0. If any numbers are in common at all, it will be some non-zero value:
Number Bitmask v
9876543210
------ -----------------
314159 000000 1000111010
320 000000 0000001101
----------------- AND(&)
000000 0000001000
^
The bit representing the common digit 3.
And that leads us to the actual checking code, something relative easy to build on top of the scoring function:
#include <stdio.h>
int main (int argc, char *argv[]) {
// Default input numbers to both zero, then try
// to get them from arguments.
int check = 0, exclude = 0, excludeMask;
if (argc > 1)
check = atoi(argv[1]);
if (argc > 2)
exclude = atoi(argv[2]);
// Get the mask for the exclusion number, only done once.
excludeMask = getMask (exclude);
// Then we loop, looking for a mask that has no
// common bits.
printf ("%d -> ", check);
//check++;
while ((excludeMask & getMask (check)) != 0)
check++;
printf ("%d\n", check);
return 0;
}
The flow is basically to get the numbers from the arguments, work out the bitmask for the exclusion number (the one that has the digits you don't want in the result), then start looking from the check number, until you find one.
I've commented out the initial check++ since I'm not sure whether you really wanted a higher number than the one given, or whether 123 with an exclusion of 98 should give you the actual starting number of 123. If not, just uncomment the line.
And there you have it, as shown in the following transcript, which includes your test data amongst other things:
$ ./myprog 378 78
378 -> 390
$ ./myprog 3454 54
3454 -> 3600
$ ./myprog 123 98 # would give 124 if 'count++' uncommented
123 -> 123
$ ./myprog 314159 6413
314159 -> 500000
It does have one potentially fatal flaw but one that's easy enough to fix if you check the exclusion bitmask before you start looking for. I'll leave that as an exercise for the reader, but just think about what might happen with the following command:
$ ./myprog 1 154862397
And, of course, if you want to go the other way (lower numbers), it's a matter of decrementing check rather then incrementing it. You may also need to get a bit smarter about what you want to happen if you go negative, such as with:
$ ./myprog 1 102
The code as it currently stands may not handle that so well.

c: bit reversal logic

I was looking at the below bit reversal code and just wondering how does one come up with these kind of things. (source : http://www.cl.cam.ac.uk/~am21/hakmemc.html)
/* reverse 8 bits (Schroeppel) */
unsigned reverse_8bits(unsigned41 a) {
return ((a * 0x000202020202) /* 5 copies in 40 bits */
& 0x010884422010) /* where bits coincide with reverse repeated base 2^10 */
/* PDP-10: 041(6 bits):020420420020(35 bits) */
% 1023; /* casting out 2^10 - 1's */
}
Can someone explain what does comment "where bits coincide with reverse repeated base 2^10" mean?
Also how does "%1023" pull out the relevent bits? Is there any general idea in this?
It is a very broad question you are asking.
Here is an explanation of what % 1023 might be about: you know how computing n % 9 is like summing the digits of the base-10 representation of n? For instance, 52 % 9 = 7 = 5 + 2.
The code in your question is doing the same thing with 1023 = 1024 - 1 instead of 9 = 10 - 1. It is using the operation % 1023 to gather multiple results that have been computed “independently” as 10-bit slices of a large number.
And this is the beginning of a clue as to how the constants 0x000202020202 and 0x010884422010 are chosen: they make wide integer operations operate as independent simpler operations on 10-bit slices of a large number.
Expanding on Pascal Cuoq idea, here is an explaination.
The general idea is, in any base, if any number is divided by (base-1), the remainder will be sum of all the digits in the number.
For example, 34 when divided by 9 leaves 7 as remainder. This is because 34 can be written as 3 * 10 + 4
i.e. 34 = 3 * 10 + 4
= 3 * (9 +1) + 4
= 3 * 9 + (3 +4)
Now, 9 divides 3 * 9, leaving remainder (3 + 4). This process can be extended to any base 'b', since (b^n - 1) is always divided by (b-1).
Now, coming to the problem, if a number is represented in base 1024, and if the number is divided by 1023, the remainder will be sum of its digits.
To convert a binary number to base 1024, we can group bits of 10 from the right side into single number
For example, to convert binary number 0x010884422010(0b10000100010000100010000100010000000010000) to base 1024, we can group it into 10 bits number as follows
(1) (0000100010) (0001000100) (0010001000) (0000010000) =
(0b0000000001)*1024^4 + (0b0000100010)*1024^3 + (0b0001000100)*1024^2 + (0b0010001000)*1024^1 + (0b0000010000)*1024^0
So, when this number is divided by 1023, the remainder will sum of
0b0000000001
+ 0b0000100010
+ 0b0001000100
+ 0b0010001000
+ 0b0000010000
--------------------
0b0011111111
If you observe the above digits closely, the '1' bits in each above digit occupy complementay positions. So, when added together, it should pull all the 8 bits in the original number.
So, in the above code, "a * 0x000202020202", creates 5 copies of the byte "a". When the result is ANDed with 0x010884422010, we selectively choose 8 bits in the 5 copies of "a". When "% 1023" is applied, we pull all the 8 bits.
So, how does it actually reverse bits? That is bit clever. The idea is, the "1" bit in the digit 0b0000000001 is actually aligned with MSB of the original byte. So, when you "AND" and you are actually ANDing MSB of the original byte with LSB of the magic number digit. Similary the digit 0b0000100010 is aligned with second and sixth bits from MSB and so on.
So, when you add all the digits of the magic number, the resulting number will be reverse of the original byte.

Resources