Round Constants in Keccak - c

Recently, just for the heck of it, I've been playing around with an attempt at implementing Keccak, the cryptographic primitive behind SHA-3. I've run into some issues however, specifically with calculating the round constants used in the "Iota" step of the permutation.
Just to get it out of the way: Yes. I know they are round constants. I know I could hard code them as constants. But where's the fun in that?
I've specifically been referencing the FIPS 202 specification document on SHA-3 as well as the Keccak team's own Keccak reference. However, despite my efforts, I can't seem to end up with the correct constants. I've never dealt with bit manipulation before, so if I'm doing something the complete wrong way, feel free to let me know.
rc is a function defined in the FIPS 202 standard of Keccak that is a linear feedback shift register with a feedback polynomial of x^8 + x^6 + x^5 + x^4 + 1.
The values of t (specific to SHA-3) are defined as the set of integers that includes j + 7 * i_r, where i_r = {0, 1, ..., 22, 23} and j = {0, 1, ..., 4, 5}.
The expected outputs (the round constants) are defined as follows: 0x0000000000000001, 0x0000000000008082, 0x800000000000808a,
0x8000000080008000, 0x000000000000808b, 0x0000000080000001,
0x8000000080008081, 0x8000000000008009, 0x000000000000008a,
0x0000000000000088, 0x0000000080008009, 0x000000008000000a,
0x000000008000808b, 0x800000000000008b, 0x8000000000008089,
0x8000000000008003, 0x8000000000008002, 0x8000000000000080,
0x000000000000800a, 0x800000008000000a, 0x8000000080008081,
0x8000000000008080, 0x0000000080000001, and 0x8000000080008008.
rc Function Implementation
uint64_t rc(int t)
{
if(t % 255 == 0)
{
return 0x1;
}
uint64_t R = 0x1;
for(int i = 1; i <= t % 255; i++)
{
R = R << 0x1;
R |= (((R >> 0x0) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x0;
R |= (((R >> 0x4) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x4;
R |= (((R >> 0x5) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x5;
R |= (((R >> 0x6) & 0x1) ^ ((R >> 0x8) & 0x1)) << 0x6;
R &= 0xFF;
}
return R & 0x1;
}
rc Function Call
for(int i_r = 0; i_r < 24; i_r++)
{
uint64_t RC = 0x0;
// TODO: Fix so the limit is not constant
for(int j = 0; j < 6; j++)
{
RC ^= (rc(j + 7 * i_r) << ((int) pow(2, j) - 1));
}
printf("%llu\n", RC);
}
Any help on this matter is much appreciated.

I made some random changes to the code and now it works. Here are the highlights:
The j loop needs to count from 0 to 6. That's because 2^6-1 = 63. So if j is never 6, then the output can never have the MSB set, i.e. an output of 0x8... is not possible.
Using the pow function is generally a bad idea for this type of application. double values have a nasty habit of being slightly lower than desired, e.g. 4 is actually 3.99999999999, which gets truncated to 3 when you convert it to an int. Doubtful that was happening in this case, but why risk it, since it's easy to just multiply variable shift by 2 on each pass through the loop.
The maximum value for t is 7*23+6 = 167, so the % 255 does nothing (at least with the value of i and t in this code). Also, there's no need to treat t == 0 as a special case. The loop won't run when t is 0, so the result is 0x1 by default.
Implementing a linear feedback shift register is quite simple in C. Each term in the polynomial corresponds to a single bit. x^8 is just 2^8 which is 0x100 and x^6 + x^5 + x^4 + 1 is 0x71. So whenever bit 0x100 is set, you XOR the result by 0x71.
Here's the updated code:
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
uint64_t rc(int t)
{
uint64_t result = 0x1;
for (int i = 1; i <= t; i++)
{
result <<= 1;
if (result & 0x100)
result ^= 0x71;
}
return result & 0x1;
}
int main(void)
{
for (int i = 0; i < 24; i++)
{
uint64_t result = 0x0;
uint64_t shift = 1;
for (int j = 0; j < 7; j++)
{
uint64_t value = rc(7*i + j);
result |= value << (shift - 1);
shift *= 2;
}
printf("0x%016" PRIx64 "\n", result);
}
}

Related

Reversing Endianess C

I'm lost on bit shifting operations, I'm trying to reverse byte order on 32 bit ints, what I've managed to look up online I only got this far but cant seem to find why its not working
int32_t swapped = 0; // Assign num to the tmp
for(int i = 0; i < 32; i++)
{
swapped |= num & 1; // putting the set bits of num
swapped >>= 1; //shift the swapped Right side
num <<= 1; //shift the swapped left side
}
And I'm printing like this
num = swapped;
for (size_t i = 0; i < 32; i++)
{
printf("%d",(num >> i));
}
Your code looks likes its attempting to swap bits, and not bytes. If you are wanting to swap bytes, then the 'complete' method would be:
int32_t swapped = ((num >> 24) & 0x000000FF) |
((num >> 8) & 0x0000FF00) |
((num << 8) & 0x00FF0000) |
((num << 24) & 0xFF000000);
I say 'complete', because the last bitwise-and can be omitted, and the first bitwise-and can be omitted if num is unsigned.
If you want to swap the bits in a 32bit number, your loop should probably max out at 16 (if it's 32, the first 16 steps will swap the bits, the next 16 steps will swap them back again).
int32_t swapped = 0;
for(int i = 0; i < 16; ++i)
{
// the masks for the two bits (hi and lo) we will be swapping
// shift a '1' to the correct bit location based on the index 'i'
uint32_t hi_mask = 1 << (31 - i);
uint32_t lo_mask = 1 << i;
// use bitwise and to mask out the original bits in the number
uint32_t hi_bit = num & hi_mask;
uint32_t lo_bit = num & lo_mask;
// shift the bits so they switch places
uint32_t new_lo_bit = hi_bit >> (31 - i);
uint32_t new_hi_bit = lo_bit << (31 - i);
// use bitwise-or to combine back into an int
swapped |= new_lo_bit;
swapped |= new_hi_bit;
}
Code written for readability - there are faster ways to reverse the bits in a 32bit number. As for printing:
for (size_t i = 0; i < 32; i++)
{
bool bit = (num >> (31 - i)) & 0x1;
printf(bit ? "1" : "0");
}

Correctness of Fletcher32 checksum algorithm

I'm having a hard time figuring out which implementation of the 32-bit variation of the Fletcher checksum algorithm is correct. Wikipedia provides the following optimized implementation:
uint32_t fletcher32( uint16_t const *data, size_t words ) {
uint32_t sum1 = 0xffff, sum2 = 0xffff;
size_t tlen;
while (words) {
tlen = words >= 359 ? 359 : words;
words -= tlen;
do {
sum2 += sum1 += *data++;
} while (--tlen);
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
}
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
return sum2 << 16 | sum1;
}
In addition, I've adapted the non-optimized 16-bit example from the Wikipedia article to compute a 32-bit checksum:
uint32_t naive_fletcher32(uint16_t *data, int words) {
uint32_t sum1 = 0;
uint32_t sum2 = 0;
int index;
for( index = 0; index < words; ++index ) {
sum1 = (sum1 + data[index]) % 0xffff;
sum2 = (sum2 + sum1) % 0xffff;
}
return (sum2 << 16) | sum1;
}
Both these implementations yield the same results, e.g. 0x56502d2a for the string abcdef. To verify that this is indeed correct, I tried to find other implementations of the algorithm:
An online checksum/hash generator
C++ implementation in the srecord project
There's also a JavaScript implementation
All of these seem to agree that the checksum for abcdef is 0x8180255 instead of the value given by the implementation on Wikipedia. I've narrowed this down to how the data buffer the implementation operates on. All the above non-wikipedia implementation operate one byte at a time, whereas the Wikipedia implementation computes the checksum using 16-bit words. If I modify the above "naive" Wikipedia implementation to operate per-byte instead, it reads like this:
uint32_t naive_fletcher32_per_byte(uint8_t *data, int words) {
uint32_t sum1 = 0;
uint32_t sum2 = 0;
int index;
for( index = 0; index < words; ++index ) {
sum1 = (sum1 + data[index]) % 0xffff;
sum2 = (sum2 + sum1) % 0xffff;
}
return (sum2 << 16) | sum1;
}
The only thing that changes is the signature, really. So this modified naive implementation and the above mentioned implementations (except Wikipedia) agree that the checksum of abcdef is indeed 0x8180255.
My problem now is: which one is correct?
According to the standard, the right method is the one that Wikipedia provides — except the name:
Note that the 8-bit Fletcher algorithm gives a 16-bit checksum and the 16-bit algorithm gives a 32-bit checksum.
In the standard quoted in the answer of HideFromKGB, the algorithm is trivial: the 8-bit version uses only 8 bit accumulators ("ints"), producing 8 bit results A and B, and the 16-bit version uses 16 bit "ints", producing 16 bit results A and B.
It should be noted that what Wikipedia calls the "32 bit Fletcher" is actually the "16 bit Fletcher". The number of bits in the name refers in the standard to the number of bits in each D[i] and in each of A and B, but on Wikipedia it refers to the number of bits in the "stacked result", i.e. in A<<16 | B for the 32 bit result.
I did not implement this, but maybe this can explain the difference. I am inclined to say that your interpretation (implementation) is correct.
N.b.: also note that it is necessary to pad data with zeroes to the appropriate number of bytes.
These are test vectors, which are cross checked with two different implementations for 16-bit and for 32-bit check sums:
8-bit implementation (16-bit checksum)
"abcde" -> 51440 (0xC8F0)
"abcdef" -> 8279 (0x2057)
"abcdefgh" -> 1575 (0x0627)
16-bit implementation (32-bit checksum)
"abcde" -> 4031760169 (0xF04FC729)
"abcdef" -> 1448095018 (0x56502D2A)
"abcdefgh" -> 3957429649 (0xEBE19591)
TCP Alternate Checksum Options describes the Fletcher checksum algorithm for use with TCP: RFC 1146 dated March 1990.
The 8-bit Fletcher algorithm which gives a 16-bit checksum and the 16-bit algorithm which gives a 32-bit checksum are discussed.
The 8-bit Fletcher Checksum Algorithm is calculated over a sequence
of data octets (call them D[1] through D[N]) by maintaining 2
unsigned 1's-complement 8-bit accumulators A and B whose contents are
initially zero, and performing the following loop where i ranges from
1 to N:
A := A + D[i]
B := B + A
The 16-bit Fletcher Checksum algorithm proceeds in precisely the same
manner as the 8-bit checksum algorithm, except that A, B and the
D[i] are 16-bit quantities. It is necessary (as it is with the
standard TCP checksum algorithm) to pad a datagram containing an odd
number of octets with a zero octet.
That agrees with Wikipedia algorithms. The simple testing program confirms quoted results:
#include <stdio.h>
#include <string.h>
#include <stdint.h> // for uint32_t
uint32_t fletcher32_1(const uint16_t *data, size_t len)
{
uint32_t c0, c1;
unsigned int i;
for (c0 = c1 = 0; len >= 360; len -= 360) {
for (i = 0; i < 360; ++i) {
c0 = c0 + *data++;
c1 = c1 + c0;
}
c0 = c0 % 65535;
c1 = c1 % 65535;
}
for (i = 0; i < len; ++i) {
c0 = c0 + *data++;
c1 = c1 + c0;
}
c0 = c0 % 65535;
c1 = c1 % 65535;
return (c1 << 16 | c0);
}
uint32_t fletcher32_2(const uint16_t *data, size_t l)
{
uint32_t sum1 = 0xffff, sum2 = 0xffff;
while (l) {
unsigned tlen = l > 359 ? 359 : l;
l -= tlen;
do {
sum2 += sum1 += *data++;
} while (--tlen);
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
}
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
return (sum2 << 16) | sum1;
}
int main()
{
char *str1 = "abcde";
char *str2 = "abcdef";
size_t len1 = (strlen(str1)+1) / 2; // '\0' will be used for padding
size_t len2 = (strlen(str2)+1) / 2; //
uint32_t f1 = fletcher32_1(str1, len1);
uint32_t f2 = fletcher32_2(str1, len1);
printf("%u %X \n", f1,f1);
printf("%u %X \n\n", f2,f2);
f1 = fletcher32_1(str2, len2);
f2 = fletcher32_2(str2, len2);
printf("%u %X \n",f1,f1);
printf("%u %X \n",f2,f2);
return 0;
}
Output:
4031760169 F04FC729
4031760169 F04FC729
1448095018 56502D2A
1448095018 56502D2A
My answer is focusses on the correctness of s = (s & 0xffff) + (s >> 16);.
This is obviously supposed to replace the modulo operation. Now the big issue with the modulo operation is the division that needs to be performed. The trick is to not do the division and to estimate floor(s / 65535). So instead of computing s - floor(s/65535)*65535, which would be the same as modulo, we compute s - floor(s/65536)*65535. This will obviously not be equivalent to doing modulo. But it's good enough to quickly reduce the size of s.
Now we have
s - floor(s / 65536) * 65535
= s - (s >> 16) * 65535
= s - (s >> 16) * (65536 - 1)
= s - (s >> 16) * 65536 + (s >> 16)
= (s & 0xffff) + (s >> 16)
Since the (s & 0xffff) + (s >> 16) is not equivalent to doing the modulo, it does not suffice to use this formula. If s == 65535 then s % 65535 would yield zero. However, the former formula yields 65535. So the optimized Wikipedia implementation posted here is obviously false! The last 3 lines need to be changed to
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
if (sum1 >= 65535) { sum1 -= 65535; }
if (sum2 >= 65535) { sum2 -= 65535; }
return (sum2 << 16) | sum1;
It is noteworthy, that I can't find the optimized implementation on the Wikipedia page anymore (February 2020).
Addendum:
Imagine s would be equal to the maximum unsigned 32 bit value, that is 0xFFFF_FFFF. Then the formula (s & 0xffff) + (s >> 16); yields 0x1FFFE. That is exactly two times 65535. So the correction step if (s >= 65535) { s -= 65535; } will not work since it subtracts 65535 at most once. So we want to keep sum1 and sum2 in the loops strictly smaller than 0xFFFF_FFFF. Then the formula yields at most 2*65535-1 and the correction step will work. The following simple python program determines, that sum2 would become too big after 360 iterations. So processing at most 359 16 bit words at a time is exactly right.
s1 = 0x1FFFD
s2 = 0x1FFFD
for i in range(1,1000):
s1 += 0xFFFF
s2 += s1
if s2 >= 0xFFFFFFFF:
print(i)
break

How to interleave 2 booleans using bitwise operators?

Suppose I have two 4-bit values, ABCD and abcd. How to interleave it, so it becomes AaBbCcDd, using bitwise operators? Example in pseudo-C:
nibble a = 0b1001;
nibble b = 0b1100;
char c = foo(a,b);
print_bits(c);
// output: 0b11010010
Note: 4 bits is just for illustration, I want to do this with two 32bit ints.
This is called the perfect shuffle operation, and it's discussed at length in the Bible Of Bit Bashing, Hacker's Delight by Henry Warren, section 7-2 "Shuffling Bits."
Assuming x is a 32-bit integer with a in its high-order 16 bits and b in its low-order 16 bits:
unsigned int x = (a << 16) | b; /* put a and b in place */
the following straightforward C-like code accomplishes the perfect shuffle:
x = (x & 0x0000FF00) << 8 | (x >> 8) & 0x0000FF00 | x & 0xFF0000FF;
x = (x & 0x00F000F0) << 4 | (x >> 4) & 0x00F000F0 | x & 0xF00FF00F;
x = (x & 0x0C0C0C0C) << 2 | (x >> 2) & 0x0C0C0C0C | x & 0xC3C3C3C3;
x = (x & 0x22222222) << 1 | (x >> 1) & 0x22222222 | x & 0x99999999;
He also gives an alternative form which is faster on some CPUs, and (I think) a little more clear and extensible:
unsigned int t; /* an intermediate, temporary variable */
t = (x ^ (x >> 8)) & 0x0000FF00; x = x ^ t ^ (t << 8);
t = (x ^ (x >> 4)) & 0x00F000F0; x = x ^ t ^ (t << 4);
t = (x ^ (x >> 2)) & 0x0C0C0C0C; x = x ^ t ^ (t << 2);
t = (x ^ (x >> 1)) & 0x22222222; x = x ^ t ^ (t << 1);
I see you have edited your question to ask for a 64-bit result from two 32-bit inputs. I'd have to think about how to extend Warren's technique. I think it wouldn't be too hard, but I'd have to give it some thought. If someone else wanted to start here and give a 64-bit version, I'd be happy to upvote them.
EDITED FOR 64 BITS
I extended the second solution to 64 bits in a straightforward way. First I doubled the length of each of the constants. Then I added a line at the beginning to swap adjacent double-bytes and intermix them. In the following 4 lines, which are pretty much the same as the 32-bit version, the first line swaps adjacent bytes and intermixes, the second line drops down to nibbles, the third line to double-bits, and the last line to single bits.
unsigned long long int t; /* an intermediate, temporary variable */
t = (x ^ (x >> 16)) & 0x00000000FFFF0000ull; x = x ^ t ^ (t << 16);
t = (x ^ (x >> 8)) & 0x0000FF000000FF00ull; x = x ^ t ^ (t << 8);
t = (x ^ (x >> 4)) & 0x00F000F000F000F0ull; x = x ^ t ^ (t << 4);
t = (x ^ (x >> 2)) & 0x0C0C0C0C0C0C0C0Cull; x = x ^ t ^ (t << 2);
t = (x ^ (x >> 1)) & 0x2222222222222222ull; x = x ^ t ^ (t << 1);
From Stanford "Bit Twiddling Hacks" page:
https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableObvious
uint32_t x = /*...*/, y = /*...*/;
uint64_t z = 0;
for (int i = 0; i < sizeof(x) * CHAR_BIT; i++) // unroll for more speed...
{
z |= (x & 1U << i) << i | (y & 1U << i) << (i + 1);
}
Look at the page they propose different and faster algorithms to achieve the same.
Like so:
#include <limits.h>
typedef unsigned int half;
typedef unsigned long long full;
full mix_bits(half a,half b)
{
full result = 0;
for (int i=0; i<sizeof(half)*CHAR_BIT; i++)
result |= (((a>>i)&1)<<(2*i+1))|(((b>>i)&1)<<(2*i+0));
return result;
}
Here is a loop-based solution that is hopefully more readable than some of the others already here.
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
uint64_t interleave(uint32_t a, uint32_t b) {
uint64_t result = 0;
int i;
for (i = 0; i < 31; i++) {
result |= (a >> (31 - i)) & 1;
result <<= 1;
result |= (b >> (31 - i)) & 1;
result <<= 1;
}
// Skip the last left shift.
result |= (a >> (31 - i)) & 1;
result <<= 1;
result |= (b >> (31 - i)) & 1;
return result;
}
void printBits(uint64_t a) {
int i;
for (i = 0; i < 64; i++)
printf("%lu", (a >> (63 - i)) & 1);
puts("");
}
int main(){
uint32_t a = 0x9;
uint32_t b = 0x6;
uint64_t c = interleave(a,b);
printBits(a);
printBits(b);
printBits(c);
}
I have used the 2 tricks/operations used in this post How do you set, clear, and toggle a single bit? of setting a bit at particular index and checking the bit at particular index.
The following code is implemented using these 2 operations only.
int a = 0b1001;
int b = 0b1100;
long int c=0;
int index; //To specify index of c
int bit,i;
//Set bits in c from right to left.
for(i=32;i>=0;i--)
{
index=2*i+1; //We have to add the bit in c at this index
//Check a
bit=a&(1<<i); //Checking whether the i-th bit is set in a
if(bit)
c|=1<<index; //Setting bit in c at index
index--;
//Check b
bit=b&(1<<i); //Checking whether the i-th bit is set in b
if(bit)
c|=1<<index; //Setting bit in c at index
}
printf("%ld",c);
Output: 210 which is 0b11010010

What's the best way to toggle the MSB?

So I want to toggle the most significant bit of my number. Here is an example:
x = 100101 then answer should be 00101
I have a 64 bit machine and hence I am not expecting the answer to be 100000..<51 0's>..100101
One way I thought of was to count the number of bits in my number and then toggle the MSB, but not sure on how to count.
The cheat is to pawn it off to the compiler: There are instructions in most CPUs for doing work like this.
The following should do what you want.
i ^ (1 << (sizeof i * CHAR_BIT - clz(i) - 1))
This will translate into the CLZ instruction, which counts the leading zeros.
For GCC, see: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Other-Builtins.html
One thing to be careful of is that this results in undefined behavior if i == 0.
You should replace clz() with the correct intrinsic for your compiler, In GCC this is __builtin_clz; in Visual Studio C++ this is _BitScanForward.
#jleahy has already posted a good option in case of using GCC, I would only leave here a generic implementation of clz which does not use any compiler intrinsics. However, it is not the optimal choice for CPUs which already have native instructions for counting bits (such as x86).
#define __bit_msb_mask(n) (~(~0x0ul >> (n))) /* n leftmost bits. */
/* Count leading zeroes. */
int clz(unsigned long x) {
int nr = 0;
int sh;
assert(x);
/* Hope that compiler optimizes out the sizeof check. */
if (sizeof(x) == 8) {
/* Suppress "shift count >= width of type" error in case
* when sizeof(x) is NOT 8, i.e. when it is a dead code anyway. */
sh = !(x & __bit_msb_mask(sizeof(x)*8/2)) << 5;
nr += sh; x <<= sh;
}
sh = !(x & __bit_msb_mask(1 << 4)) << 4; nr += sh; x <<= sh;
sh = !(x & __bit_msb_mask(1 << 3)) << 3; nr += sh; x <<= sh;
sh = !(x & __bit_msb_mask(1 << 2)) << 2; nr += sh; x <<= sh;
sh = !(x & __bit_msb_mask(1 << 1)) << 1; nr += sh; x <<= sh;
sh = !(x & __bit_msb_mask(1 << 0)) << 0; nr += sh;
return nr;
}
Using this function one can toggle the most significant set bit (assuming there is such one) as follows:
x ^= 1ul << (sizeof(x)*8 - clz(x))
Here's an approach using a lookup table, assuming CHAR_BIT == 8:
uint32_t toggle_msb(uint32_t n)
{
static unsigned char const lookup[] =
{ 1, 0, 0, 1, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7 };
for (unsigned int i = 0; i != sizeof n; ++i)
{
// omit the last bit for big-endian machines: ---VVVVVVVVVVVVVVVVVV
unsigned char * p
= reinterpret_cast<unsigned char *>(&n) + sizeof n - i - 1;
if (*p / 16 != 0) { *p = *p % 16 + (lookup[*p / 16] * 16); return n; }
if (*p % 16 != 0) { *p = 16 * (*p / 16) + lookup[*p % 16]; return n; }
}
return 1;
}
And to just put it all together in some sample code for GCC:
#include <stdio.h>
#define clz(x) __builtin_clz(x)
int main()
{
int i = 411; /* 110011011 */
if( i != 0 )
i ^= (1 << (sizeof(i)*8 - clz(i)-1));
/* i is now 10011011 */
printf("i = %d\n", i);
return(0);
}

how to make a bit-set/byte-array conversion in c

Given an array,
unsigned char q[32]="1100111...",
how can I generate a 4-bytes bit-set, unsigned char p[4], such that, the bit of this bit-set, equals to value inside the array, e.g., the first byte p[0]= "q[0] ... q[7]"; 2nd byte p[1]="q[8] ... q[15]", etc.
and also how to do it in opposite, i.e., given bit-set, generate the array?
my own trial out for the first part.
unsigned char p[4]={0};
for (int j=0; j<N; j++)
{
if (q[j] == '1')
{
p [j / 8] |= 1 << (7-(j % 8));
}
}
Is the above right? any conditions to check? Is there any better way?
EDIT - 1
I wonder if above is efficient way? As the array size could be upto 4096 or even more.
First, Use strtoul to get a 32-bit value. Then convert the byte order to big-endian with htonl. Finally, store the result in your array:
#include <arpa/inet.h>
#include <stdlib.h>
/* ... */
unsigned char q[32] = "1100111...";
unsigned char result[4] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));
There are other ways as well.
But I lack <arpa/inet.h>!
Then you need to know what byte order your platform is. If it's big endian, then htonl does nothing and can be omitted. If it's little-endian, then htonl is just:
unsigned long htonl(unsigned long x)
{
x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
return x;
}
If you're lucky, your optimizer might see what you're doing and make it into efficient code. If not, well, at least it's all implementable in registers and O(log N).
If you don't know what byte order your platform is, then you need to detect it:
typedef union {
char c[sizeof(int) / sizeof(char)];
int i;
} OrderTest;
unsigned long htonl(unsigned long x)
{
OrderTest test;
test.i = 1;
if(!test.c[0])
return x;
x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
return x;
}
Maybe long is 8 bytes!
Well, the OP implied 4-byte inputs with their array size, but 8-byte long is doable:
#define kCharsPerLong (sizeof(long) / sizeof(char))
unsigned char q[8 * kCharsPerLong] = "1100111...";
unsigned char result[kCharsPerLong] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));
unsigned long htonl(unsigned long x)
{
#if kCharsPerLong == 4
x = (x & 0xFF00FF00UL) >> 8) | (x & 0x00FF00FFUL) << 8);
x = (x & 0xFFFF0000UL) >> 16) | (x & 0x0000FFFFUL) << 16);
#elif kCharsPerLong == 8
x = (x & 0xFF00FF00FF00FF00UL) >> 8) | (x & 0x00FF00FF00FF00FFUL) << 8);
x = (x & 0xFFFF0000FFFF0000UL) >> 16) | (x & 0x0000FFFF0000FFFFUL) << 16);
x = (x & 0xFFFFFFFF00000000UL) >> 32) | (x & 0x00000000FFFFFFFFUL) << 32);
#else
#error Unsupported word size.
#endif
return x;
}
For char that isn't 8 bits (DSPs like to do this), you're on your own. (This is why it was a Big Deal when the SHARC series of DSPs had 8-bit bytes; it made it a LOT easier to port existing code because, face it, C does a horrible job of portability support.)
What about arbitrary length buffers? No funny pointer typecasts, please.
The main thing that can be improved with the OP's version is to rethink the loop's internals. Instead of thinking of the output bytes as a fixed data register, think of it as a shift register, where each successive bit is shifted into the right (LSB) end. This will save you from all those divisions and mods (which, hopefully, are optimized away to bit shifts).
For sanity, I'm ditching unsigned char for uint8_t.
#include <stdint.h>
unsigned StringToBits(const char* inChars, uint8_t* outBytes, size_t numBytes,
size_t* bytesRead)
/* Converts the string of '1' and '0' characters in `inChars` to a buffer of
* bytes in `outBytes`. `numBytes` is the number of available bytes in the
* `outBytes` buffer. On exit, if `bytesRead` is not NULL, the value it points
* to is set to the number of bytes read (rounding up to the nearest full
* byte). If a multiple of 8 bits is not read, the last byte written will be
* padded with 0 bits to reach a multiple of 8 bits. This function returns the
* number of padding bits that were added. For example, an input of 11 bits
* will result `bytesRead` being set to 2 and the function will return 5. This
* means that if a nonzero value is returned, then a partial byte was read,
* which may be an error.
*/
{ size_t bytes = 0;
unsigned bits = 0;
uint8_t x = 0;
while(bytes < numBytes)
{ /* Parse a character. */
switch(*inChars++)
{ '0': x <<= 1; ++bits; break;
'1': x = (x << 1) | 1; ++bits; break;
default: numBytes = 0;
}
/* See if we filled a byte. */
if(bits == 8)
{ outBytes[bytes++] = x;
x = 0;
bits = 0;
}
}
/* Padding, if needed. */
if(bits)
{ bits = 8 - bits;
outBytes[bytes++] = x << bits;
}
/* Finish up. */
if(bytesRead)
*bytesRead = bytes;
return bits;
}
It's your responsibility to make sure inChars is null-terminated. The function will return on the first non-'0' or '1' character it sees or if it runs out of output buffer. Some example usage:
unsigned char q[32] = "1100111...";
uint8_t buf[4];
size_t bytesRead = 5;
if(StringToBits(q, buf, 4, &bytesRead) || bytesRead != 4)
{
/* Partial read; handle error here. */
}
This just reads 4 bytes, and traps the error if it can't.
unsigned char q[4096] = "1100111...";
uint8_t buf[512];
StringToBits(q, buf, 512, NULL);
This just converts what it can and sets the rest to 0 bits.
This function could be done better if C had the ability to break out of more than one level of loop or switch; as it stands, I'd have to add a flag value to get the same effect, which is clutter, or I'd have to add a goto, which I simply refuse.
I don't think that will quite work. You are comparing each "bit" to 1 when it should really be '1'. You can also make it a bit more efficient by getting rid of the if:
unsigned char p[4]={0};
for (int j=0; j<32; j++)
{
p [j / 8] |= (q[j] == `1`) << (7-(j % 8));
}
Going in reverse is pretty simple too. Just mask for each "bit" that you set earlier.
unsigned char q[32]={0};
for (int j=0; j<32; j++) {
q[j] = p[j / 8] & ( 1 << (7-(j % 8)) ) + '0';
}
You'll notice the creative use of (boolean) + '0' to convert between 1/0 and '1'/'0'.
According to your example it does not look like you are going for readability, and after a (late) refresh my solution looks very similar to Chriszuma except for the lack of parenthesis due to order of operations and the addition of the !! to enforce a 0 or 1.
const size_t N = 32; //N must be a multiple of 8
unsigned char q[N+1] = "11011101001001101001111110000111";
unsigned char p[N/8] = {0};
unsigned char r[N+1] = {0}; //reversed
for(size_t i = 0; i < N; ++i)
p[i / 8] |= (q[i] == '1') << 7 - i % 8;
for(size_t i = 0; i < N; ++i)
r[i] = '0' + !!(p[i / 8] & 1 << 7 - i % 8);
printf("%x %x %x %x\n", p[0], p[1], p[2], p[3]);
printf("%s\n%s\n", q,r);
If you are looking for extreme efficiency, try to use the following techniques:
Replace if by subtraction of '0' (seems like you can assume your input symbols can be only 0 or 1).
Also process the input from lower indices to higher ones.
for (int c = 0; c < N; c += 8)
{
int y = 0;
for (int b = 0; b < 8; ++b)
y = y * 2 + q[c + b] - '0';
p[c / 8] = y;
}
Replace array indices by auto-incrementing pointers:
const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
int y = 0;
for (int b = 0; b < 8; ++b)
y = y * 2 + *qptr++ - '0';
*pptr++ = y;
}
Unroll the inner loop:
const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
*pptr++ =
qptr[0] - '0' << 7 |
qptr[1] - '0' << 6 |
qptr[2] - '0' << 5 |
qptr[3] - '0' << 4 |
qptr[4] - '0' << 3 |
qptr[5] - '0' << 2 |
qptr[6] - '0' << 1 |
qptr[7] - '0' << 0;
qptr += 8;
}
Process several input characters simultaneously (using bit twiddling hacks or MMX instructions) - this has great speedup potential!

Resources