convert number to slash divided hex path - c

I need to generate a path string from a number (in C)
e.g:
53431453 -> 0003/2F4/C9D
what I have so far is this:
char *id_to_path(long long int id, char *d)
{
char t[MAX_PATH_LEN];
sprintf(t, "%010llX", id);
memcpy(d, t, 4);
memcpy(d+5, t+4, 3);
memcpy(d+9, t+7, 4);
d[4] = d[8] = '/';
return d;
}
I'm wondering if there's a better way, e.g to generate the final string in one step instead of doing sprintf and then moving the bytes around.
Thanks
Edit:
I benchmarked the given solutions
results in operations per second (higher is better):
(1) sprintf + memcpy : 3383005
(2) single sprintf : 2219253
(3) not using sprintf : 10917996
when compiling with -O3 the difference is even greater:
(1) 4422101
(2) 2207157
(3) 178756551
Since this function will be called a lot, I'll use the fastest solution even though the single sprintf is the shortest and most readable.
Thanks for your answers!

Not tested, but you can split the int into three then print it:
char *id_to_path(long long int id, char *d)
{
sprintf(d, "%04llX/%03llX/%03llX", ( id >> 24 ) & 0xffff, ( id >> 12 ) & 0xfff, id & 0xfff);
return d;
}

Since the string uses hex, it can be quite easily done using shift and bit operators.
Getting the 4 highest bits from the value can be done like this:
id >> 28
Converting this to a digit simply means adding the character '0' to it, like this:
'0' + (id >> 28)
However, since A, B, C, ... don't immediately follow the character 9, we have to perform an additional check, something like:
if (c > '9') c = c - '9' - 1 'A'
If we want the next 4 bits, we should only shift 24 bits, but then we still have the highest 4 bits left, so we should mask them out, like this:
(id >> 24) & 0xf
If we pour this into your function, we get this:
char convert (int value)
{
char c = value + '0';
if (c > '9') c = c - '9' - 1 + 'A';
return c;
}
void main()
{
long id = 53431453;
char buffer[20];
buffer[0] = convert(id >> 28);
buffer[1] = convert((id >> 24) & 0xf);
buffer[2] = convert((id >> 20) & 0xf);
buffer[3] = convert((id >> 16) & 0xf);
buffer[4] = convert((id >> 12) & 0xf);
buffer[5] = convert((id >> 8) & 0xf);
buffer[6] = convert((id >> 4) & 0xf);
buffer[7] = convert((id >> 0) & 0xf);
buffer[8] = '\0';
}
Now adjust this to add the slashes in between, the extra zeroes in the beginning, ...
EDIT:
I know this is not in one step, but it is better extensible if you later want to change the places of the slashes, ...

Did you try this option yet?
typedef struct {
unsigned f7 : 4;
unsigned f6 : 4;
unsigned f5 : 4;
unsigned f4 : 4;
unsigned f3 : 4;
unsigned f2 : 4;
unsigned f1 : 4;
unsigned f0 : 4;
} lubf;
#define convert(a) ( a > 9 ? a + 'A' - 10 : a + '0' )
int main()
{
lubf bf;
unsigned long a = 0xABCDE123;
memcpy(&bf, &a, sizeof(a));
char arr[9];
arr[0] = convert(bf.f0);
arr[1] = convert(bf.f1);
arr[2] = convert(bf.f2);
arr[3] = convert(bf.f3);
arr[4] = convert(bf.f4);
arr[5] = convert(bf.f5);
arr[6] = convert(bf.f6);
arr[7] = convert(bf.f7);
arr[8] = '\0';
printf("%lX : %s\n", a, arr);
};

Related

Word every 2 bits to symbol

I have a function that read a word, bit by bit and change to symbol:
I need help to change it to read every 2 bits and change to symbol.
I don't have an idea for it and I need your help guys
void PrintWeirdBits(word w , char* buf){
word mask = 1<<(BITS_IN_WORD-1);
int i;
for(i=0;i<BITS_IN_WORD;i++){
if(mask & w)
buf[i]='/';
else
buf[i]='.';
mask>>=1;
}
buf[i] = '\0';
}
Needed symbols:
00 - *
01 - #
10 - %
11 - !
Here is my proposal for your issue.
Using a lookup table for the symbol decoding will eliminate the need in if statements.
(I assumed word is an unsigned 16 bits data type)
#define BITS_PER_SIGN 2
#define BITS_PER_SIGN_MSK 3 // decimal 3 is 0b11 in binary --> two bits set
// General define could be:
// ((1u << BITS_PER_SIGN) - 1)
#define INIT_MASK (BITS_PER_SIGN_MSK << (BITS_IN_WORD - BITS_PER_SIGN))
void PrintWeirdBits(word w , char* buf)
{
static const char signs[] = {'*', '#', '%', '!'};
unsigned mask = INIT_MASK;
int i;
int sign_idx;
for(i=0; i < BITS_IN_WORD / BITS_PER_SIGN; i++)
{
// the bits of the sign represent the index in the signs array
// just need to align these bits to start from bit 0
sign_idx = (w & mask) >> (BITS_IN_WORD - (i + 1)*BITS_PER_SIGN);
// store the decoded sign in the buffer
buf[i] = signs[sign_idx];
// update the mask for the next symbol
mask >>= BITS_PER_SIGN;
}
buf[i] = '\0';
}
Here it seems to be working.
With small effort it can be updated to a generic code for any bit width of the symbol as long as it is power of two (1, 2, 4, 8) and smaller that BITS_IN_WORD.
Assuming word is unsigned int or an unsigned integer type.
void PrintWeirdBits(word w , char* buf){
word mask = 3 << (BITS_IN_WORD -2);
int i;
word cmp;
for(i=0;i<BITS_IN_WORD/2;i++){
cmp = (mask & w) >> (BITS_IN_WORD -2 -2i);
if(cmp == 0x00)
{
buf[i]='*';
}
else if (cmp == 0x01)
{
buf[i]='#';
}
else if (cmp == 0x02)
{
buf[i]='%';
}
else
{
buf[i]='!';
}
mask>>=2;
}
buf[i] = '\0';
}
The important part is
cmp = (mask & w) >> (BITS_IN_WORD -2 -2i);
Here mask and the input w is bitwise ANDed and the result is right shifted to get the value in the first two bits. These bits are compared to get the result.

Efficient Conversion of a Binary Number to Hexadecimal String [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am writing a program that converts a binary value's hexadecimal representation to a regular string. So each character in the hex representation would convert to two hexadecimal characters in the string. This means the result will be twice the size; a hexadecimal representation of 1 byte would need two bytes in a string.
Hexadecimal Characters
0123456789 ;0x30 - 0x39
ABCDEF ;0x41 - 0x46
Example
0xF05C1E3A ;hex
4032568890 ;dec
would become
0x4630354331453341 ;hex
5057600944242766657 ;dec
Question?
Are there any elegant/alternative(/interesting) methods for converting between these states, other than a lookup table, (bitwise operations, shifts, modulo, etc)?
I'm not looking for a function in a library, but rather how one would/should be implemented. Any ideas?
Here's a solution with nothing but shifts, and/or, and add/subtract. No loops either.
uint64_t x, m;
x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL);
x += 0x0606060606060606LL;
m = ((x & 0x1010101010101010LL) >> 4) + 0x7f7f7f7f7f7f7f7fLL;
x += (m & 0x2a2a2a2a2a2a2a2aLL) | (~m & 0x3131313131313131LL);
Above is the simplified version I came up with after a little time to reflect. Below is the original answer.
uint64_t x, m;
x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL);
x += 0x3636363636363636LL;
m = (x & 0x4040404040404040LL) >> 6;
x += m;
m = m ^ 0x0101010101010101LL;
x -= (m << 2) | (m << 1);
See it in action: http://ideone.com/nMhJ2q
Spreading out the nibbles to bytes is easy with pdep:
spread = _pdep_u64(raw, 0x0F0F0F0F0F0F0F0F);
Now we'd have to add 0x30 to bytes in the range 0-9 and 0x41 to higher bytes. This could be done by SWAR-subtracting 10 from every byte and then using the sign to select which number to add, such as (not tested)
H = 0x8080808080808080;
ten = 0x0A0A0A0A0A0A0A0A
cmp = ((spread | H) - (ten &~H)) ^ ((spread ^~ten) & H); // SWAR subtract
masks = ((cmp & H) >> 7) * 255;
// if x-10 is negative, take 0x30, else 0x41
add = (masks & 0x3030303030303030) | (~masks & 0x3737373737373737);
asString = spread + add;
That SWAR compare can probably be optimized since you shouldn't need a full subtract to implement it.
There are some different suggestions here, including SIMD: http://0x80.pl/articles/convert-to-hex.html
A slightly simpler version based on Mark Ransom's:
uint64_t x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL);
x = (x + 0x3030303030303030LL) +
(((x + 0x0606060606060606LL) & 0x1010101010101010LL) >> 4) * 7;
And if you want to avoid the multiplication:
uint64_t m, x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL);
m = (x + 0x0606060606060606LL) & 0x1010101010101010LL;
x = (x + 0x3030303030303030LL) + (m >> 1) - (m >> 4);
A bit more decent conversion from the the integer to the string any base from 2 to length of the digits
char *reverse(char *);
const char digits[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *convert(long long number, char *buff, int base)
{
char *result = (buff == NULL || base > strlen(digits) || base < 2) ? NULL : buff;
char sign = 0;
if (number < 0)
{
sign = '-';
number = -number;
}
if (result != NULL)
{
do
{
*buff++ = digits[number % base];
number /= base;
} while (number);
if(sign) *buff++ = sign;
*buff = 0;
reverse(result);
}
return result;
}
char *reverse(char *str)
{
char tmp;
int len;
if (str != NULL)
{
len = strlen(str);
for (int i = 0; i < len / 2; i++)
{
tmp = *(str + i);
*(str + i) = *(str + len - i - 1);
*(str + len - i - 1) = tmp;
}
}
return str;
}
example - counting from -50 to 50 decimal in base 23
-24 -23 -22 -21 -20 -1M -1L -1K -1J -1I -1H -1G -1F -1E -1D
-1C -1B -1A -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -M -L
-K -J -I -H -G -F -E -D -C -B -A -9 -8 -7 -6
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
A B C D E F G H I J K L M 10 11
12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 1G
1H 1I 1J 1K 1L 1M 20 21 22 23 24
A LUT (lookup table) C++ variant. I didn't check the actual machine code produced, but I believe any modern C++ compiler can catch the idea and compile it well.
static const char nibble2hexChar[] { "0123456789ABCDEF" };
// 17B in total, because I'm lazy to init it per char
void byteToHex(std::ostream & out, const uint8_t value) {
out << nibble2hexChar[value>>4] << nibble2hexChar[value&0xF];
}
// this one is actually written more toward short+simple source, than performance
void dwordToHex(std::ostream & out, uint32_t value) {
int i = 8;
while (i--) {
out << nibble2hexChar[value>>28];
value <<= 4;
}
}
EDIT: For C code you have just to switch from std::ostream to some other output means, unfortunately your question lacks any details, what you are actually trying to achieve and why you don't use the built-in printf family of C functions.
For example C like this can write to some char* output buffer, converting arbitrary amount of bytes:
/**
* Writes hexadecimally formatted "n" bytes array "values" into "outputBuffer".
* Make sure there's enough space in output buffer allocated, and add zero
* terminator yourself, if you plan to use it as C-string.
*
* #Returns: pointer after the last character written.
*/
char* dataToHex(char* outputBuffer, const size_t n, const unsigned char* values) {
for (size_t i = 0; i < n; ++i) {
*outputBuffer++ = nibble2hexChar[values[i]>>4];
*outputBuffer++ = nibble2hexChar[values[i]&0xF];
}
return outputBuffer;
}
And finally, I did help once somebody on code review, as he had performance bottleneck exactly with hexadecimal formatting, but I did there the code variant conversion, without LUT, also the whole process and other answer + performance measuring may be instructional for you, as you may see that the fastest solution doesn't just blindly convert result, but actually mix up with the main operation, to achieve better performance overall. So that's why I'm wonder what you are trying to solve, as the whole problem may often allow for more optimal solution, if you just ask about conversion, printf("%x",..) is safe bet.
Here is that another approach for "to hex" conversion:
fast C++ XOR Function
Decimal -> Hex
Just iterate throught string and every character convert to int, then you can do
printf("%02x", c);
or use sprintf for saving to another variable
Hex -> Decimal
Code
printf("%c",16 * hexToInt('F') + hexToInt('0'));
int hexToInt(char c)
{
if(c >= 'a' && c <= 'z')
c = c - ('a' - 'A');
int sum;
sum = c / 16 - 3;
sum *= 10;
sum += c % 16;
return (sum > 9) ? sum - 1 : sum;
}
The articles below compare different methods of converting digits to string, hex numbers are not covered but it seems not a big problem to switch from dec to hex
Integers
Fixed and floating point
#EDIT
Thank you for pointing that the answer above is not relevant.
Common way with no LUT is to split integer into nibbles and map them to ASCII
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#define HI_NIBBLE(b) (((b) >> 4) & 0x0F)
#define LO_NIBBLE(b) ((b) & 0x0F)
void int64_to_char(char carr[], int64_t val){
memcpy(carr, &val, 8);
}
uint64_t inp = 0xF05C1E3A;
char tmp_st[8];
int main()
{
int64_to_char(tmp_st,inp);
printf("Sample: %x\n", inp);
printf("Result: 0x");
for (unsigned int k = 8; k; k--){
char tmp_ch = *(tmp_st+k-1);
char hi_nib = HI_NIBBLE(tmp_ch);
char lo_nib = LO_NIBBLE(tmp_ch);
if (hi_nib || lo_nib){
printf("%c%c",hi_nib+((hi_nib>9)?55:48),lo_nib+((lo_nib>9)?55:48));
}
}
printf("\n");
return 0;
}
Another way is to use Allison's Algorithm. I am total noob in ASM, so I post the code in the form I googled it.
Variant 1:
ADD AL,90h
DAA
ADC AL,40h
DAA
Variant 2:
CMP AL, 0Ah
SBB AL, 69h
DAS

Give byte existing byte value (homework)

I need to make an assignment where I switch the values of a certain int. For example: 0xaabbccdd should be turned in to 0xddccbbaa.
I've already extraced all of the bytes from the given number and their values are correct.
unsigned int input;
scanf("%i", &input);
unsigned int first_byte = (input >> (8*0)) & 0xff;
unsigned int second_byte = (input >> (8*1)) & 0xff;
unsigned int third_byte = (input >> (8*2)) & 0xff;
unsigned int fourth_byte = (input >> (8*3)) & 0xff;
Now I'm trying to set an empty int variable (aka 00000000 00000000 00000000 00000000) to those byte values, but turned around. So how can I say that the first byte of the empty variable is the fourth byte of the given input? I've been trying different combinations of bitwise operations, but I can't seem to wrap my head around it. I'm pretty sure I should be able to do something like:
answer *first byte* | fourth_byte;
I would appreciate any help, becau'se I've been stuck and searching for an answer for a couple of hours now.
Based on your code :
#include <stdio.h>
int main(void)
{
unsigned int input = 0xaabbccdd;
unsigned int first_byte = (input >> (8*0)) & 0xff;
unsigned int second_byte = (input >> (8*1)) & 0xff;
unsigned int third_byte = (input >> (8*2)) & 0xff;
unsigned int fourth_byte = (input >> (8*3)) & 0xff;
printf(" 1st : %x\n 2nd : %x\n 3rd : %x\n 4th : %x\n",
first_byte,
second_byte,
third_byte,
fourth_byte);
unsigned int combo = first_byte<<8 | second_byte;
combo = combo << 8 | third_byte;
combo = combo << 8 | fourth_byte;
printf(" combo : %x ", combo);
return 0;
}
It will output 0xddccbbaa
Here's a more elegant function to do this :
unsigned int setByte(unsigned int input, unsigned char byte, unsigned int position)
{
if(position > sizeof(unsigned int) - 1)
return input;
unsigned int orbyte = byte;
input |= byte<<(position * 8);
return input;
}
Usage :
unsigned int combo = 0;
combo = setByte(combo, first_byte, 3);
combo = setByte(combo, second_byte, 2);
combo = setByte(combo, third_byte, 1);
combo = setByte(combo, fourth_byte, 0);
printf(" combo : %x ", combo);
unsigned int result;
result = ((first_byte <<(8*3)) | (second_byte <<(8*2)) | (third_byte <<(8*1)) | (fourth_byte))
You can extract the bytes and put them back in order as you're trying, that's a perfectly valid approach. But here are some other possibilities:
bswap, if you have access to it. It's an x86 instruction that does exactly this. It doesn't get any simpler. Similar instructions may exist on other platforms. Probably not good for a C assignment though.
Or, swapping adjacent "fields". If you have AABBCCDD and first swap adjacent 8-bit groups (get BBAADDCC), and then swap adjacent 16-bit groups, you get DDCCBBAA as desired. This can be implemented, for example: (not tested)
x = ((x & 0x00FF00FF) << 8) | ((x >> 8) & 0x00FF00FF);
x = ((x & 0x0000FFFF) << 16) | ((x >> 16) & 0x0000FFFF);
Or, a closely related method but with rotates. In AABBCCDD, AA and CC are both rotated to the left by 8 positions, and BB and DD are both rotated right by 8 positions. So you get:
x = rol(x & 0xFF00FF00, 8) | ror(x & 0x00FF00FF, 8);
This requires rotates however, which most high level languages don't provide, and emulating them with two shifts and an OR negates their advantage.
#include <stdio.h>
int main(void)
{
unsigned int input = 0xaabbccdd,
byte[4] = {0},
n = 0,
output = 0;
do
{
byte[n] = (input >> (8*n)) & 0xff;
n = n + 1;
}while(n < 4);
n = 0;
do
{
printf(" %d : %x\n", byte[n]);
n = n + 1;
}while (n < 4);
n = 0;
do
{
output = output << 8 | byte[n];
n = n + 1;
}while (n < 4);
printf(" output : %x ", output );
return 0;
}
You should try to avoid repeating code.

how to make a bit-set/byte-array conversion in c

Given an array,
unsigned char q[32]="1100111...",
how can I generate a 4-bytes bit-set, unsigned char p[4], such that, the bit of this bit-set, equals to value inside the array, e.g., the first byte p[0]= "q[0] ... q[7]"; 2nd byte p[1]="q[8] ... q[15]", etc.
and also how to do it in opposite, i.e., given bit-set, generate the array?
my own trial out for the first part.
unsigned char p[4]={0};
for (int j=0; j<N; j++)
{
if (q[j] == '1')
{
p [j / 8] |= 1 << (7-(j % 8));
}
}
Is the above right? any conditions to check? Is there any better way?
EDIT - 1
I wonder if above is efficient way? As the array size could be upto 4096 or even more.
First, Use strtoul to get a 32-bit value. Then convert the byte order to big-endian with htonl. Finally, store the result in your array:
#include <arpa/inet.h>
#include <stdlib.h>
/* ... */
unsigned char q[32] = "1100111...";
unsigned char result[4] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));
There are other ways as well.
But I lack <arpa/inet.h>!
Then you need to know what byte order your platform is. If it's big endian, then htonl does nothing and can be omitted. If it's little-endian, then htonl is just:
unsigned long htonl(unsigned long x)
{
x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
return x;
}
If you're lucky, your optimizer might see what you're doing and make it into efficient code. If not, well, at least it's all implementable in registers and O(log N).
If you don't know what byte order your platform is, then you need to detect it:
typedef union {
char c[sizeof(int) / sizeof(char)];
int i;
} OrderTest;
unsigned long htonl(unsigned long x)
{
OrderTest test;
test.i = 1;
if(!test.c[0])
return x;
x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
return x;
}
Maybe long is 8 bytes!
Well, the OP implied 4-byte inputs with their array size, but 8-byte long is doable:
#define kCharsPerLong (sizeof(long) / sizeof(char))
unsigned char q[8 * kCharsPerLong] = "1100111...";
unsigned char result[kCharsPerLong] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));
unsigned long htonl(unsigned long x)
{
#if kCharsPerLong == 4
x = (x & 0xFF00FF00UL) >> 8) | (x & 0x00FF00FFUL) << 8);
x = (x & 0xFFFF0000UL) >> 16) | (x & 0x0000FFFFUL) << 16);
#elif kCharsPerLong == 8
x = (x & 0xFF00FF00FF00FF00UL) >> 8) | (x & 0x00FF00FF00FF00FFUL) << 8);
x = (x & 0xFFFF0000FFFF0000UL) >> 16) | (x & 0x0000FFFF0000FFFFUL) << 16);
x = (x & 0xFFFFFFFF00000000UL) >> 32) | (x & 0x00000000FFFFFFFFUL) << 32);
#else
#error Unsupported word size.
#endif
return x;
}
For char that isn't 8 bits (DSPs like to do this), you're on your own. (This is why it was a Big Deal when the SHARC series of DSPs had 8-bit bytes; it made it a LOT easier to port existing code because, face it, C does a horrible job of portability support.)
What about arbitrary length buffers? No funny pointer typecasts, please.
The main thing that can be improved with the OP's version is to rethink the loop's internals. Instead of thinking of the output bytes as a fixed data register, think of it as a shift register, where each successive bit is shifted into the right (LSB) end. This will save you from all those divisions and mods (which, hopefully, are optimized away to bit shifts).
For sanity, I'm ditching unsigned char for uint8_t.
#include <stdint.h>
unsigned StringToBits(const char* inChars, uint8_t* outBytes, size_t numBytes,
size_t* bytesRead)
/* Converts the string of '1' and '0' characters in `inChars` to a buffer of
* bytes in `outBytes`. `numBytes` is the number of available bytes in the
* `outBytes` buffer. On exit, if `bytesRead` is not NULL, the value it points
* to is set to the number of bytes read (rounding up to the nearest full
* byte). If a multiple of 8 bits is not read, the last byte written will be
* padded with 0 bits to reach a multiple of 8 bits. This function returns the
* number of padding bits that were added. For example, an input of 11 bits
* will result `bytesRead` being set to 2 and the function will return 5. This
* means that if a nonzero value is returned, then a partial byte was read,
* which may be an error.
*/
{ size_t bytes = 0;
unsigned bits = 0;
uint8_t x = 0;
while(bytes < numBytes)
{ /* Parse a character. */
switch(*inChars++)
{ '0': x <<= 1; ++bits; break;
'1': x = (x << 1) | 1; ++bits; break;
default: numBytes = 0;
}
/* See if we filled a byte. */
if(bits == 8)
{ outBytes[bytes++] = x;
x = 0;
bits = 0;
}
}
/* Padding, if needed. */
if(bits)
{ bits = 8 - bits;
outBytes[bytes++] = x << bits;
}
/* Finish up. */
if(bytesRead)
*bytesRead = bytes;
return bits;
}
It's your responsibility to make sure inChars is null-terminated. The function will return on the first non-'0' or '1' character it sees or if it runs out of output buffer. Some example usage:
unsigned char q[32] = "1100111...";
uint8_t buf[4];
size_t bytesRead = 5;
if(StringToBits(q, buf, 4, &bytesRead) || bytesRead != 4)
{
/* Partial read; handle error here. */
}
This just reads 4 bytes, and traps the error if it can't.
unsigned char q[4096] = "1100111...";
uint8_t buf[512];
StringToBits(q, buf, 512, NULL);
This just converts what it can and sets the rest to 0 bits.
This function could be done better if C had the ability to break out of more than one level of loop or switch; as it stands, I'd have to add a flag value to get the same effect, which is clutter, or I'd have to add a goto, which I simply refuse.
I don't think that will quite work. You are comparing each "bit" to 1 when it should really be '1'. You can also make it a bit more efficient by getting rid of the if:
unsigned char p[4]={0};
for (int j=0; j<32; j++)
{
p [j / 8] |= (q[j] == `1`) << (7-(j % 8));
}
Going in reverse is pretty simple too. Just mask for each "bit" that you set earlier.
unsigned char q[32]={0};
for (int j=0; j<32; j++) {
q[j] = p[j / 8] & ( 1 << (7-(j % 8)) ) + '0';
}
You'll notice the creative use of (boolean) + '0' to convert between 1/0 and '1'/'0'.
According to your example it does not look like you are going for readability, and after a (late) refresh my solution looks very similar to Chriszuma except for the lack of parenthesis due to order of operations and the addition of the !! to enforce a 0 or 1.
const size_t N = 32; //N must be a multiple of 8
unsigned char q[N+1] = "11011101001001101001111110000111";
unsigned char p[N/8] = {0};
unsigned char r[N+1] = {0}; //reversed
for(size_t i = 0; i < N; ++i)
p[i / 8] |= (q[i] == '1') << 7 - i % 8;
for(size_t i = 0; i < N; ++i)
r[i] = '0' + !!(p[i / 8] & 1 << 7 - i % 8);
printf("%x %x %x %x\n", p[0], p[1], p[2], p[3]);
printf("%s\n%s\n", q,r);
If you are looking for extreme efficiency, try to use the following techniques:
Replace if by subtraction of '0' (seems like you can assume your input symbols can be only 0 or 1).
Also process the input from lower indices to higher ones.
for (int c = 0; c < N; c += 8)
{
int y = 0;
for (int b = 0; b < 8; ++b)
y = y * 2 + q[c + b] - '0';
p[c / 8] = y;
}
Replace array indices by auto-incrementing pointers:
const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
int y = 0;
for (int b = 0; b < 8; ++b)
y = y * 2 + *qptr++ - '0';
*pptr++ = y;
}
Unroll the inner loop:
const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
*pptr++ =
qptr[0] - '0' << 7 |
qptr[1] - '0' << 6 |
qptr[2] - '0' << 5 |
qptr[3] - '0' << 4 |
qptr[4] - '0' << 3 |
qptr[5] - '0' << 2 |
qptr[6] - '0' << 1 |
qptr[7] - '0' << 0;
qptr += 8;
}
Process several input characters simultaneously (using bit twiddling hacks or MMX instructions) - this has great speedup potential!

What is a better method for packing 4 bytes into 3 than this?

I have an array of values all well within the range 0 - 63, and decided I could pack every 4 bytes into 3 because the values only require 6 bits and I could use the extra 2bits to store the first 2 bits of the next value and so on.
Having never done this before I used the switch statement and a nextbit variable (a state machine like device) to do the packing and keep track of the starting bit. I'm convinced however, there must be a better way.
Suggestions/clues please, but don't ruin my fun ;-)
Any portability problems regarding big/little endian?
btw: I have verified this code is working, by unpacking it again and comparing with the input. And no it ain't homework, just an exercise I've set myself.
/* build with gcc -std=c99 -Wconversion */
#define ASZ 400
typedef unsigned char uc_;
uc_ data[ASZ];
int i;
for (i = 0; i < ASZ; ++i) {
data[i] = (uc_)(i % 0x40);
}
size_t dl = sizeof(data);
printf("sizeof(data):%z\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%z\n",pl);
for (i = 0; i < dl; ++i)
printf("%02d ", data[i]);
printf("\n");
uc_ * packeddata = calloc(pl, sizeof(uc_));
uc_ * byte = packeddata;
uc_ nextbit = 1;
for (int i = 0; i < dl; ++i) {
uc_ m = (uc_)(data[i] & 0x3f);
switch(nextbit) {
case 1:
/* all 6 bits of m into first 6 bits of byte: */
*byte = m;
nextbit = 7;
break;
case 3:
/* all 6 bits of m into last 6 bits of byte: */
*byte++ = (uc_)(*byte | (m << 2));
nextbit = 1;
break;
case 5:
/* 1st 4 bits of m into last 4 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x0f) << 4));
/* 5th and 6th bits of m into 1st and 2nd bits of byte: */
*byte = (uc_)(*byte | ((m & 0x30) >> 4));
nextbit = 3;
break;
case 7:
/* 1st 2 bits of m into last 2 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x03) << 6));
/* next (last) 4 bits of m into 1st 4 bits of byte: */
*byte = (uc_)((m & 0x3c) >> 2);
nextbit = 5;
break;
}
}
So, this is kinda like code-golf, right?
#include <stdlib.h>
#include <string.h>
static void pack2(unsigned char *r, unsigned char *n) {
unsigned v = n[0] + (n[1] << 6) + (n[2] << 12) + (n[3] << 18);
*r++ = v;
*r++ = v >> 8;
*r++ = v >> 16;
}
unsigned char *apack(const unsigned char *s, int len) {
unsigned char *s_end = s + len,
*r, *result = malloc(len/4*3+3),
lastones[4] = { 0 };
if (result == NULL)
return NULL;
for(r = result; s + 4 <= s_end; s += 4, r += 3)
pack2(r, s);
memcpy(lastones, s, s_end - s);
pack2(r, lastones);
return result;
}
Check out the IETF RFC 4648 for 'The Base16, Base32 and Base64 Data Encodings'.
Partial code critique:
size_t dl = sizeof(data);
printf("sizeof(data):%d\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%d\n",pl);
Don't use the floating point stuff - just use integers. And use '%z' to print 'size_t' values - assuming you've got a C99 library.
size_t pl = ((dl + 3) / 4) * 3;
I think your loop could be simplified by dealing with 3-byte input units until you've got a partial unit left over, and then dealing with a remainder of 1 or 2 bytes as special cases. I note that the standard referenced says that you use one or two '=' signs to pad at the end.
I have a Base64 encoder and decode which does some of that. You are describing the 'decode' part of Base64 -- where the Base64 code has 4 bytes of data that should be stored in just 3 - as your packing code. The Base64 encoder corresponds to the unpacker you will need.
Base-64 Decoder
Note: base_64_inv is an array of 256 values, one for each possible input byte value; it defines the correct decoded value for each encoded byte. In the Base64 encoding, this is a sparse array - 3/4 zeroes. Similarly, base_64_map is the mapping between a value 0..63 and the corresponding storage value.
enum { DC_PAD = -1, DC_ERR = -2 };
static int decode_b64(int c)
{
int b64 = base_64_inv[c];
if (c == base64_pad)
b64 = DC_PAD;
else if (b64 == 0 && c != base_64_map[0])
b64 = DC_ERR;
return(b64);
}
/* Decode 4 bytes into 3 */
static int decode_quad(const char *b64_data, char *bin_data)
{
int b0 = decode_b64(b64_data[0]);
int b1 = decode_b64(b64_data[1]);
int b2 = decode_b64(b64_data[2]);
int b3 = decode_b64(b64_data[3]);
int bytes;
if (b0 < 0 || b1 < 0 || b2 == DC_ERR || b3 == DC_ERR || (b2 == DC_PAD && b3 != DC_PAD))
return(B64_ERR_INVALID_ENCODED_DATA);
if (b2 == DC_PAD && (b1 & 0x0F) != 0)
/* 3rd byte is '='; 2nd byte must end with 4 zero bits */
return(B64_ERR_INVALID_TRAILING_BYTE);
if (b2 >= 0 && b3 == DC_PAD && (b2 & 0x03) != 0)
/* 4th byte is '='; 3rd byte is not '=' and must end with 2 zero bits */
return(B64_ERR_INVALID_TRAILING_BYTE);
bin_data[0] = (b0 << 2) | (b1 >> 4);
bytes = 1;
if (b2 >= 0)
{
bin_data[1] = ((b1 & 0x0F) << 4) | (b2 >> 2);
bytes = 2;
}
if (b3 >= 0)
{
bin_data[2] = ((b2 & 0x03) << 6) | (b3);
bytes = 3;
}
return(bytes);
}
/* Decode input Base-64 string to original data. Output length returned, or negative error */
int base64_decode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
size_t outlen = 0;
if (datalen % 4 != 0)
return(B64_ERR_INVALID_ENCODED_LENGTH);
if (BASE64_DECLENGTH(datalen) > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 4)
{
int nbytes = decode_quad(data, buffer + outlen);
if (nbytes < 0)
return(nbytes);
outlen += nbytes;
data += 4;
datalen -= 4;
}
assert(datalen == 0); /* By virtue of the %4 check earlier */
return(outlen);
}
Base-64 Encoder
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
I complicate life by having to deal with a product that uses a variant alphabet for the Base64 encoding, and also manages not to pad data - hence the 'pad' argument (which can be zero for 'null padding' or '=' for standard padding. The 'base_64_map' array contains the alphabet to use for 6-bit values in the range 0..63.
Another simpler way to do it would be to use bit fields. One of the lesser known corners of C struct syntax is the big field. Let's say you have the following structure:
struct packed_bytes {
byte chunk1 : 6;
byte chunk2 : 6;
byte chunk3 : 6;
byte chunk4 : 6;
};
This declares chunk1, chunk2, chunk3, and chunk4 to have the type byte but to only take up 6 bits in the structure. The result is that sizeof(struct packed_bytes) == 3. Now all you need is a little function to take your array and dump it into the structure like so:
void
dump_to_struct(byte *in, struct packed_bytes *out, int count)
{
int i, j;
for (i = 0; i < (count / 4); ++i) {
out[i].chunk1 = in[i * 4];
out[i].chunk2 = in[i * 4 + 1];
out[i].chunk3 = in[i * 4 + 2];
out[i].chunk4 = in[i * 4 + 3];
}
// Finish up
switch(struct % 4) {
case 3:
out[count / 4].chunk3 = in[(count / 4) * 4 + 2];
case 2:
out[count / 4].chunk2 = in[(count / 4) * 4 + 1];
case 1:
out[count / 4].chunk1 = in[(count / 4) * 4];
}
}
There you go, you now have an array of struct packed_bytes that you can easily read by using the above struct.
Instead of using a statemachine you can simply use a counter for how many bits are already used in the current byte, from which you can directly derive the shift-offsets and whether or not you overflow into the next byte.
Regarding the endianess: As long as you use only a single datatype (that is you don't reinterpret pointer to types of different size (e.g. int* a =...;short* b=(short*) a;) you shouldn't get problems with endianess in most cases
Taking elements of DigitalRoss's compact code, Grizzly's suggestion, and my own code, I have written my own answer at last. Although DigitalRoss provides a usable working answer, my usage of it without understanding, would not have provided the same satisfaction as to learning something. For this reason I have chosen to base my answer on my original code.
I have also chosen to ignore the advice Jonathon Leffler gives to avoid using floating point arithmetic for the calculation of the packed data length. Both the recommended method given - the same DigitalRoss also uses, increases the length of the packed data by as much as three bytes. Granted this is not much, but is also avoidable by the use of floating point math.
Here is the code, criticisms welcome:
/* built with gcc -std=c99 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned char *
pack(const unsigned char * data, size_t len, size_t * packedlen)
{
float fpl = ((float)len / 4.0f) * 3.0f;
*packedlen = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
unsigned char * packed = malloc(*packedlen);
if (!packed)
return 0;
const unsigned char * in = data;
const unsigned char * in_end = in + len;
unsigned char * out;
for (out = packed; in + 4 <= in_end; in += 4) {
*out++ = in[0] | ((in[1] & 0x03) << 6);
*out++ = ((in[1] & 0x3c) >> 2) | ((in[2] & 0x0f) << 4);
*out++ = ((in[2] & 0x30) >> 4) | (in[3] << 2);
}
size_t lastlen = in_end - in;
if (lastlen > 0) {
*out = in[0];
if (lastlen > 1) {
*out++ |= ((in[1] & 0x03) << 6);
*out = ((in[1] & 0x3c) >> 2);
if (lastlen > 2) {
*out++ |= ((in[2] & 0x0f) << 4);
*out = ((in[2] & 0x30) >> 4);
if (lastlen > 3)
*out |= (in[3] << 2);
}
}
}
return packed;
}
int main()
{
size_t i;
unsigned char data[] = {
12, 15, 40, 18,
26, 32, 50, 3,
7, 19, 46, 10,
25, 37, 2, 39,
60, 59, 0, 17,
9, 29, 13, 54,
5, 6, 47, 32
};
size_t datalen = sizeof(data);
printf("unpacked datalen: %td\nunpacked data\n", datalen);
for (i = 0; i < datalen; ++i)
printf("%02d ", data[i]);
printf("\n");
size_t packedlen;
unsigned char * packed = pack(data, sizeof(data), &packedlen);
if (!packed) {
fprintf(stderr, "Packing failed!\n");
return EXIT_FAILURE;
}
printf("packedlen: %td\npacked data\n", packedlen);
for (i = 0; i < packedlen; ++i)
printf("0x%02x ", packed[i]);
printf("\n");
free(packed);
return EXIT_SUCCESS;
}

Resources