Circular shift 28 bits within 4 bytes in C - c

I have an unsigned char *Buffer that contains 4 bytes, but only 28 of them are relevant to me.
I am looking to create a function that will do a circular shift of the 28 bits while ignoring the remaining 4 bits.
For example, I have the following within *Buffer
1111000011001100101010100000
Say I want to left circular shift by 1 bit of the 28 bits, making it
1110000110011001010101010000
I have looked around and I can't figure out how to get the shift, ignore the last 4 bits, and have the ability to shift either 1, 2, 3, or 4 bits depending on a variable set earlier in the program.
Any help with this would be smashing! Thanks in advance.

Only 1 bit at a time, but this does a 28 bit circular shift
uint32_t csl28(uint32_t value) {
uint32_t overflow_mask = 0x08000000;
uint32_t value_mask = 0x07FFFFFF;
return ((value & value_mask) << 1) | ((value & overflow_mask) >> 27);
}
uint32_t csr28(uint32_t value) {
uint32_t overflow_mask = 0x00000001;
uint32_t value_mask = 0x0FFFFFFE;
return ((value & value_mask) >> 1) | ((value & overflow_mask) << 27);
}
Another version, based on this article. This shifts an artbitrary number of bits (count) within an arbitrarily wide bit field (width). To left shift a value 5 bits in a 23 bit wide field: rotl32(value, 5, 23);
uint32_t rotl32 (uint32_t value, uint32_t count, uint32_t width) {
uint32_t value_mask = ((uint32_t)~0) >> (CHAR_BIT * sizeof(value) - width);
const uint32_t mask = (width-1);
count &= mask;
return value_mask & ((value<<count) | (value>>( (-count) & mask )));
}
uint32_t rotr32 (uint32_t value, uint32_t count, uint32_t width) {
uint32_t value_mask = ((uint32_t)~0) >> (CHAR_BIT * sizeof(value) - width);
const uint32_t mask = (width-1);
count &= mask;
return value_mask & ((value>>count) | (value<<( (-count) & mask )));
}
The above functions assume the value is stored in the low order bits of "value"
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
const char *uint32_to_binary(uint32_t x)
{
static char b[33];
b[0] = '\0';
uint32_t z;
for (z = 0x80000000; z > 0; z >>= 1)
{
strcat(b, ((x & z) == z) ? "1" : "0");
}
return b;
}
uint32_t reverse(uint32_t value)
{
return (value & 0x000000FF) << 24 | (value & 0x0000FF00) << 8 |
(value & 0x00FF0000) >> 8 | (value & 0xFF000000) >> 24;
}
int is_big_endian(void)
{
union {
uint32_t i;
char c[4];
} bint = {0x01020304};
return bint.c[0] == 1;
}
int main(int argc, char** argv) {
char b[] = { 0x98, 0x02, 0xCA, 0xF0 };
char *buffer = b;
//uint32_t num = 0x01234567;
uint32_t num = *((uint32_t *)buffer);
if (!is_big_endian()) {
num = reverse(*((uint32_t *)buffer));
}
num >>= 4;
printf("%x\n", num);
for(int i=0;i<5;i++) {
printf("%s\n", uint32_to_binary(num));
num = rotl32(num, 3, 28);
}
for(int i=0;i<5;i++) {
//printf("%08x\n", num);
printf("%s\n", uint32_to_binary(num));
num = rotr32(num, 3, 28);
}
unsigned char out[4];
memset(out, 0, sizeof(unsigned char) * 4);
num <<= 4;
if (!is_big_endian()) {
num = reverse(num);
}
*((uint32_t*)out) = num;
printf("[ ");
for (int i=0;i<4;i++) {
printf("%s0x%02x", i?", ":"", out[i] );
}
printf(" ]\n");
}

First you mask the top four most significant bits
*(buffer + 3) &= 0x0F;
Then you can perform the circular shift of the remaining 28 bits by x bits.
Note: This will work for little endian architecture(x86 Pc's and most microcontrollers)

[...] that contains 4 bytes, but only 28 of them [...]
We got it, but...
I guess that you mis-typed the second number of your example. Or you '''ignore''' 4 bits from left and right so you're actually interrested in 24 bits? Anyway:
Use same principle as in
Circular shift in c.
You need to convert your Buffer to a 32 bit arithmetic type, before. Maybe uint32_t is what you need?
Where did Buffer get his value? You may need to think about endianness.

Related

Interleave 4 byte ints to 8 byte int

I'm currently working to create a function which accepts two 4 byte unsigned integers, and returns an 8 byte unsigned long. I've tried to base my work off of the methods depicted by this research but all my attempts have been unsuccessful. The specific inputs I am working with are: 0x12345678 and 0xdeadbeef, and the result I'm looking for is 0x12de34ad56be78ef. This is my work so far:
unsigned long interleave(uint32_t x, uint32_t y){
uint64_t result = 0;
int shift = 33;
for(int i = 64; i > 0; i-=16){
shift -= 8;
//printf("%d\n", i);
//printf("%d\n", shift);
result |= (x & i) << shift;
result |= (y & i) << (shift-1);
}
}
However, this function keeps returning 0xfffffffe which is incorrect. I am printing and verifying these values using:
printf("0x%x\n", z);
and the input is initialized like so:
uint32_t x = 0x12345678;
uint32_t y = 0xdeadbeef;
Any help on this topic would be greatly appreciated, C has been a very difficult language for me, and bitwise operations even more so.
This can be done based on interleaving bits, but skipping some steps so it only interleaves bytes. Same idea: first spread out the bytes in a couple of steps, then combine them.
Here is the plan, illustrated with my amazing freehand drawing skills:
In C (not tested):
// step 1, moving the top two bytes
uint64_t a = (((uint64_t)x & 0xFFFF0000) << 16) | (x & 0xFFFF);
// step 2, moving bytes 2 and 6
a = ((a & 0x00FF000000FF0000) << 8) | (a & 0x000000FF000000FF);
// same thing with y
uint64_t b = (((uint64_t)y & 0xFFFF0000) << 16) | (y & 0xFFFF);
b = ((b & 0x00FF000000FF0000) << 8) | (b & 0x000000FF000000FF);
// merge them
uint64_t result = (a << 8) | b;
Using SSSE3 PSHUFB has been suggested, it'll work but there is an instruction that can do a byte-wise interleave in one go, punpcklbw. So all we need to really do is get the values into and out of vector registers, and that single instruction will then just care of it.
Not tested:
uint64_t interleave(uint32_t x, uint32_t y) {
__m128i xvec = _mm_cvtsi32_si128(x);
__m128i yvec = _mm_cvtsi32_si128(y);
__m128i interleaved = _mm_unpacklo_epi8(yvec, xvec);
return _mm_cvtsi128_si64(interleaved);
}
With bit-shifting and bitwise operations (endianness independent):
uint64_t interleave(uint32_t x, uint32_t y){
uint64_t result = 0;
for(uint8_t i = 0; i < 4; i ++){
result |= ((x & (0xFFull << (8*i))) << (8*(i+1)));
result |= ((y & (0xFFull << (8*i))) << (8*i));
}
return result;
}
With pointers (endianness dependent):
uint64_t interleave(uint32_t x, uint32_t y){
uint64_t result = 0;
uint8_t * x_ptr = (uint8_t *)&x;
uint8_t * y_ptr = (uint8_t *)&y;
uint8_t * r_ptr = (uint8_t *)&result;
for(uint8_t i = 0; i < 4; i++){
*(r_ptr++) = y_ptr[i];
*(r_ptr++) = x_ptr[i];
}
return result;
}
Note: this solution assumes little-endian byte order
You could do it like this:
uint64_t interleave(uint32_t x, uint32_t y)
{
uint64_t z;
unsigned char *a = (unsigned char *)&x; // 1
unsigned char *b = (unsigned char *)&y; // 1
unsigned char *c = (unsigned char *)&z;
c[0] = a[0];
c[1] = b[0];
c[2] = a[1];
c[3] = b[1];
c[4] = a[2];
c[5] = b[2];
c[6] = a[3];
c[7] = b[3];
return z;
}
Interchange a and b on the lines marked 1 depending on ordering requirement.
A version with shifts, where the LSB of y is always the LSB of the output as in your example, is:
uint64_t interleave(uint32_t x, uint32_t y)
{
return
(y & 0xFFull)
| (x & 0xFFull) << 8
| (y & 0xFF00ull) << 8
| (x & 0xFF00ull) << 16
| (y & 0xFF0000ull) << 16
| (x & 0xFF0000ull) << 24
| (y & 0xFF000000ull) << 24
| (x & 0xFF000000ull) << 32;
}
The compilers I tried don't seem to do a good job of optimizing either version so if this is a performance critical situation then maybe the inline assembly suggestion from comments is the way to go.
use union punning. Easy for the compiler to optimize.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
typedef union
{
uint64_t u64;
struct
{
union
{
uint32_t a32;
uint8_t a8[4]
};
union
{
uint32_t b32;
uint8_t b8[4]
};
};
uint8_t u8[8];
}data_64;
uint64_t interleave(uint32_t a, uint32_t b)
{
data_64 in , out;
in.a32 = a;
in.b32 = b;
for(size_t index = 0; index < sizeof(a); index ++)
{
out.u8[index * 2 + 1] = in.a8[index];
out.u8[index * 2 ] = in.b8[index];
}
return out.u64;
}
int main(void)
{
printf("%llx\n", interleave(0x12345678U, 0xdeadbeefU)) ;
}

Swap alternate bytes in a integer

Problem: swap alternate bytes as below:
Input: uint8_t buf[4] = {0xab,0xcd,0xef,0xba};
Output: 0xcdababef
I have the below code for doing that but I am wondering if there is any better way to shorten the code.
#include <stdint.h>
#define SWAP_16(buf) (((buf & 0xFF00) >> 8) | ((buf & 0x00FF) << 8))
int main()
{
unsigned int value;
int i, j=0;
uint8_t buf[4] = {0,4,0,0};
unsigned int mask = 0xFFFF;
unsigned int tmp_value;
unsigned int size = 4;
for (i = size - 1 ;i >= 0; i--) {
tmp_value |= (buf[j] << 8*i);
j++;
}
value = SWAP_16((tmp_value & (mask << 16)) >> 16) << 16 |
SWAP_16(tmp_value & mask);
return 0;
}
Assuming unsigned int is 32-bits, you can simply use:
value = ((value & 0xff00ff00) >> 8) | ((value & 0x00ff00ff) << 8);
to swap the bytes in each pair of bytes in value. It's similar to your SWAP_16() macro except that it does both halves of the value at once.
unsigned int forward = 0x12345678;
unsigned int reverse;
unsigned char *f = &forward;
unsigned char *r = &reverse;
r[0]=f[3];
r[1]=f[2];
r[2]=f[1];
r[3]=f[0];
now reverse will be 0x78563412
Here is one way:
#include <stdio.h>
#include <stdint.h>
int main(void)
{
uint8_t buf[4] = {0xab,0xcd,0xef,0xba};
unsigned int out = buf[1] * 0x1000000u + buf[0] * 0x10000u + buf[3] * 0x100u + buf[2];
printf("%x\n", out);
}
It's not immediately clear from your question if it's not an option, but you could merely just swap the bytes in the array if you know the size won't change:
#include <stdio.h>
#include <stdint.h>
#define SWAPPED(b) { b[1], b[0], b[3], b[2] }
#define PRINT(b) printf("0x0%x\n", *((uint32_t*)b));
int main()
{
uint8_t buf[4] = {8,4,6,1};
uint8_t swapped[4] = SWAPPED(buf);
PRINT(buf);
PRINT(swapped);
return 0;
}
The output for this on my machine is:
0x01060408
0x06010804
This is because of endian-ness and printing an array casted to an integer type, but the bytes are swapped as you ask in your question.
Hope that helps.
Use a union
#include <stdint.h>
#define SWAP_VAR(T, v1, v2) do { \
T v = (v1); \
(v1) = (v2); \
(v2) = v; \
} while (0);
union U32
{
uint32_t u;
unsigned char a[4];
};
uint32_t swap32(uint32_t u)
{
union U32 u32 = {u};
SWAP_VAR(unsigned char, u32.a[0], u32.a[1]);
SWAP_VAR(unsigned char, u32.a[2], u32.a[3]);
return u32.u;
}
Use it like this:
#include <stdint.h>
uint32_t swap32(uint32_t u);
int main(void)
{
uint32_t u = 0x12345678;
u = swap32(u);
}
unsigned int n = ((unsigned int)buf[0] << 16) |
((unsigned int)buf[1] << 24) |
((unsigned int)buf[2] << 0) |
((unsigned int)buf[3] << 8);

Bit hack: Expanding bits

I am trying to convert a uint16_t input to a uint32_t bit mask. One bit in the input toggles two bits in the output bit mask. Here is an example converting a 4-bit input to an 8-bit bit mask:
Input Output
ABCDb -> AABB CCDDb
A,B,C,D are individual bits
Example outputs:
0000b -> 0000 0000b
0001b -> 0000 0011b
0010b -> 0000 1100b
0011b -> 0000 1111b
....
1100b -> 1111 0000b
1101b -> 1111 0011b
1110b -> 1111 1100b
1111b -> 1111 1111b
Is there a bithack-y way to achieve this behavior?
Interleaving bits by Binary Magic Numbers contained the clue:
uint32_t expand_bits(uint16_t bits)
{
uint32_t x = bits;
x = (x | (x << 8)) & 0x00FF00FF;
x = (x | (x << 4)) & 0x0F0F0F0F;
x = (x | (x << 2)) & 0x33333333;
x = (x | (x << 1)) & 0x55555555;
return x | (x << 1);
}
The first four steps consecutively interleave the source bits in groups of 8, 4, 2, 1 bits with zero bits, resulting in 00AB00CD after the first step, 0A0B0C0D after the second step, and so on. The last step then duplicates each even bit (containing an original source bit) into the neighboring odd bit, thereby achieving the desired bit arrangement.
A number of variants are possible. The last step can also be coded as x + (x << 1) or 3 * x. The | operators in the first four steps can be replaced by ^ operators. The masks can also be modified as some bits are naturally zero and don't need to be cleared. On some processors short masks may be incorporated into machine instructions as immediates, reducing the effort for constructing and / or loading the mask constants. It may also be advantageous to increase instruction-level parallelism for out-of-order processors and optimize for those with shift-add or integer-multiply-add instructions. One code variant incorporating various of these ideas is:
uint32_t expand_bits (uint16_t bits)
{
uint32_t x = bits;
x = (x ^ (x << 8)) & ~0x0000FF00;
x = (x ^ (x << 4)) & ~0x00F000F0;
x = x ^ (x << 2);
x = ((x & 0x22222222) << 1) + (x & 0x11111111);
x = (x << 1) + x;
return x;
}
The easiest way to map a 4-bit input to an 8-bit output is with a 16 entry table. So then it's just a matter of extracting 4 bits at a time from the uint16_t, doing a table lookup, and inserting the 8-bit value into the output.
uint32_t expandBits( uint16_t input )
{
uint32_t table[16] = {
0x00, 0x03, 0x0c, 0x0f,
0x30, 0x33, 0x3c, 0x3f,
0xc0, 0xc3, 0xcc, 0xcf,
0xf0, 0xf3, 0xfc, 0xff
};
uint32_t output;
output = table[(input >> 12) & 0xf] << 24;
output |= table[(input >> 8) & 0xf] << 16;
output |= table[(input >> 4) & 0xf] << 8;
output |= table[ input & 0xf];
return output;
}
This provides a decent compromise between performance and readability. It doesn't have quite the performance of cmaster's over-the-top lookup solution, but it's certainly more understandable than thndrwrks' magical mystery solution. As such, it provides a technique that can be applied to a much larger variety of problems, i.e. use a small lookup table to solve a larger problem.
In case you want to get some estimate of relative speeds, some community wiki test code. Adjust as needed.
void f_cmp(uint32_t (*f1)(uint16_t x), uint32_t (*f2)(uint16_t x)) {
uint16_t x = 0;
do {
uint32_t y1 = (*f1)(x);
uint32_t y2 = (*f2)(x);
if (y1 != y2) {
printf("%4x %8lX %8lX\n", x, (unsigned long) y1, (unsigned long) y2);
}
} while (x++ != 0xFFFF);
}
void f_time(uint32_t (*f1)(uint16_t x)) {
f_cmp(expand_bits, f1);
clock_t t1 = clock();
volatile uint32_t y1 = 0;
unsigned n = 1000;
for (unsigned i = 0; i < n; i++) {
uint16_t x = 0;
do {
y1 += (*f1)(x);
} while (x++ != 0xFFFF);
}
clock_t t2 = clock();
printf("%6llu %6llu: %.6f %lX\n", (unsigned long long) t1,
(unsigned long long) t2, 1.0 * (t2 - t1) / CLOCKS_PER_SEC / n,
(unsigned long) y1);
fflush(stdout);
}
int main(void) {
f_time(expand_bits);
f_time(expandBits);
f_time(remask);
f_time(javey);
f_time(thndrwrks_expand);
// now in the other order
f_time(thndrwrks_expand);
f_time(javey);
f_time(remask);
f_time(expandBits);
f_time(expand_bits);
return 0;
}
Results
0 280: 0.000280 FE0C0000 // fast
280 702: 0.000422 FE0C0000
702 1872: 0.001170 FE0C0000
1872 3026: 0.001154 FE0C0000
3026 4399: 0.001373 FE0C0000 // slow
4399 5740: 0.001341 FE0C0000
5740 6879: 0.001139 FE0C0000
6879 8034: 0.001155 FE0C0000
8034 8470: 0.000436 FE0C0000
8486 8751: 0.000265 FE0C0000
Here's a working implementation:
uint32_t remask(uint16_t x)
{
uint32_t i;
uint32_t result = 0;
for (i=0;i<16;i++) {
uint32_t mask = (uint32_t)x & (1U << i);
result |= mask << (i);
result |= mask << (i+1);
}
return result;
}
On each iteration of the loop, the bit in question from the uint16_t is masked out and stored.
That bit is then shifted by its bit position and ORed into the result, then shifted again by its bit position plus 1 and ORed into the result.
If your concern is performance and simplicity, you are likely best of with a big lookup table (64k entries of 4 bytes each). With that, you can pretty much use any algorithm you like to generate the table, lookup will just be a single memory access.
If that table is too big for your liking, you can split it. For instance, you can use a 8 bit lookup table with 256 entries of 2 bytes each. With that you can perform the entire operation with just two lookups. Bonus is, that this approach allows for type-punning tricks to avoid the hassle of splitting the address with bit operations:
//Implementation defined behavior ahead:
//Works correctly for both little and big endian machines,
//however, results will be wrong on a PDP11...
uint32_t getMask(uint16_t input) {
assert(sizeof(uint16_t) == 2);
assert(sizeof(uint32_t) == 4);
static const uint16_t lookupTable[256] = { 0x0000, 0x0003, 0x000c, 0x000f, ... };
unsigned char* inputBytes = (unsigned char*)&input; //legal because we type-pun to char, but the order of the bytes is implementation defined
char outputBytes[4];
uint16_t* outputShorts = (uint16_t*)outputBytes; //legal because we type-pun from char, but the order of the shorts is implementation defined
outputShorts[0] = lookupTable[inputBytes[0]];
outputShorts[1] = lookupTable[inputBytes[1]];
uint32_t output;
memcpy(&output, outputBytes, 4); //can't type-pun directly from uint16 to uint32_t due to strict aliasing rules
return output;
}
The code above works around strict aliasing rules by casting only to/from char, which is an explicit exception to the strict aliasing rules. It also works around the effects of little/big-endian byte order by building the result in the same order as the input was split. However, it still exposes implementation defined behavior: A machine with a byte order of 1, 0, 3, 2, or other middle endian orders, will silently produce wrong results (there have actually been such CPUs like the PDP11...).
Of course, you can split the lookup table even further, but I doubt that would do you any good.
A simple loop. Maybe not bit-hacky enough?
uint32_t thndrwrks_expand(uint16_t x) {
uint32_t mask = 3;
uint32_t y = 0;
while (x) {
if (x&1) y |= mask;
x >>= 1;
mask <<= 2;
}
return y;
}
Tried another that is twice as fast. Still 655/272 as slow as expand_bits(). Appears to be fastest 16 loop iteration solution.
uint32_t thndrwrks_expand(uint16_t x) {
uint32_t y = 0;
for (uint16_t mask = 0x8000; mask; mask >>= 1) {
y <<= 1;
y |= x&mask;
}
y *= 3;
return y;
}
Try this, where input16 is the uint16_t input mask:
uint32_t input32 = (uint32_t) input16;
uint32_t result = 0;
uint32_t i;
for(i=0; i<16; i++)
{
uint32_t bit_at_i = (input32 & (((uint32_t)1) << i)) >> i;
result |= ((bit_at_i << (i*2)) | (bit_at_i << ((i*2)+1)));
}
// result is now the 32 bit expanded mask
My solution is meant to run on mainstream x86 PCs and be simple and generic. I did not write this to compete for the fastest and/or shortest implementation. It is just another way to solve the problem submitted by OP.
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#define BITS_TO_EXPAND (4U)
#define SIZE_MAX (256U)
static bool expand_uint(unsigned int *toexpand,unsigned int *expanded);
int main(void)
{
unsigned int in = 12;
unsigned int out = 0;
bool success;
char buff[SIZE_MAX];
success = expand_uint(&in,&out);
if(false == success)
{
(void) puts("Error: expand_uint failed");
return EXIT_FAILURE;
}
(void) snprintf(buff, (size_t) SIZE_MAX,"%u expanded is %u\n",in,out);
(void) fputs(buff,stdout);
return EXIT_SUCCESS;
}
/*
** It expands an unsigned int so that every bit in a nibble is copied twice
** in the resultant number. It returns true on success, false otherwise.
*/
static bool expand_uint(unsigned int *toexpand,unsigned int *expanded)
{
unsigned int i;
unsigned int shifts = 0;
unsigned int mask;
if(NULL == toexpand || NULL == expanded)
{
return false;
}
*expanded = 0;
for(i = 0; i < BIT_TO_EXPAND; i++)
{
mask = (*toexpand >> i) & 1;
*expanded |= (mask << shifts);
++shifts;
*expanded |= (mask << shifts);
++shifts;
}
return true;
}

Extract 14-bit values from an array of bytes in C

In an arbitrary-sized array of bytes in C, I want to store 14-bit numbers (0-16,383) tightly packed. In other words, in the sequence:
0000000000000100000000000001
there are two numbers that I wish to be able to arbitrarily store and retrieve into a 16-bit integer. (in this case, both of them are 1, but could be anything in the given range) If I were to have the functions uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) and void 14bitarr_set(unsigned char* arr, unsigned int index, uint16_t value), how would I implement those functions?
This is not for a homework project, merely my own curiosity. I have a specific project that this would be used for, and it is the key/center of the entire project.
I do not want an array of structs that have 14-bit values in them, as that generates waste bits for every struct that is stored. I want to be able to tightly pack as many 14-bit values as I possibly can into an array of bytes. (e.g.: in a comment I made, putting as many 14-bit values into a chunk of 64 bytes is desirable, with no waste bits. the way those 64 bytes work is completely tightly packed for a specific use case, such that even a single bit of waste would take away the ability to store another 14 bit value)
Well, this is bit fiddling at its best. Doing it with an array of bytes makes it more complicated than it would be with larger elements because a single 14 bit quantity can span 3 bytes, where uint16_t or anything bigger would require no more than two. But I'll take you at your word that this is what you want (no pun intended). This code will actually work with the constant set to anything 8 or larger (but not over the size of an int; for that, additional type casts are needed). Of course the value type must be adjusted if larger than 16.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#define W 14
uint16_t arr_get(unsigned char* arr, size_t index) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
uint16_t result = arr[byte_index] >> bit_in_byte_index;
for (unsigned n_bits = 8 - bit_in_byte_index; n_bits < W; n_bits += 8)
result |= arr[++byte_index] << n_bits;
return result & ~(~0u << W);
}
void arr_set(unsigned char* arr, size_t index, uint16_t value) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
arr[byte_index] &= ~(0xff << bit_in_byte_index);
arr[byte_index++] |= value << bit_in_byte_index;
unsigned n_bits = 8 - bit_in_byte_index;
value >>= n_bits;
while (n_bits < W - 8) {
arr[byte_index++] = value;
value >>= 8;
n_bits += 8;
}
arr[byte_index] &= 0xff << (W - n_bits);
arr[byte_index] |= value;
}
int main(void) {
int mod = 1 << W;
int n = 50000;
unsigned x[n];
unsigned char b[2 * n];
for (int tries = 0; tries < 10000; tries++) {
for (int i = 0; i < n; i++) {
x[i] = rand() % mod;
arr_set(b, i, x[i]);
}
for (int i = 0; i < n; i++)
if (arr_get(b, i) != x[i])
printf("Err #%d: %d should be %d\n", i, arr_get(b, i), x[i]);
}
return 0;
}
Faster versions Since you said in comments that performance is an issue: open coding the loops gives a roughly 10% speed improvement on my machine on the little test driver included in the original. This includes random number generation and testing, so perhaps the primitives are 20% faster. I'm confident that 16- or 32-bit array elements would give further improvements because byte access is expensive:
uint16_t arr_get(unsigned char* a, size_t i) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
return (a[iy] | (a[iy+1] << 8)) & 0x3fff;
case 2:
return ((a[iy] >> 2) | (a[iy+1] << 6)) & 0x3fff;
case 4:
return ((a[iy] >> 4) | (a[iy+1] << 4) | (a[iy+2] << 12)) & 0x3fff;
}
return ((a[iy] >> 6) | (a[iy+1] << 2) | (a[iy+2] << 10)) & 0x3fff;
}
#define M(IB) (~0u << (IB))
#define SETLO(IY, IB, V) a[IY] = (a[IY] & M(IB)) | ((V) >> (14 - (IB)))
#define SETHI(IY, IB, V) a[IY] = (a[IY] & ~M(IB)) | ((V) << (IB))
void arr_set(unsigned char* a, size_t i, uint16_t val) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
a[iy] = val;
SETLO(iy+1, 6, val);
return;
case 2:
SETHI(iy, 2, val);
a[iy+1] = val >> 6;
return;
case 4:
SETHI(iy, 4, val);
a[iy+1] = val >> 4;
SETLO(iy+2, 2, val);
return;
}
SETHI(iy, 6, val);
a[iy+1] = val >> 2;
SETLO(iy+2, 4, val);
}
Another variation
This is quite a bit faster yet on my machine, about 20% better than above:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> (ib % 8)) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
unsigned io = ib % 8;
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
Note that for this code to be safe you should allocate one extra byte at the end of the packed array. It always reads and writes 3 bytes even when the desired 14 bits are in the first 2.
One more variation Finally, this runs just a bit slower than the one above (again on my machine; YMMV), but you don't need the extra byte. It uses one comparison per operation:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
unsigned buf = ib % 8 <= 2
? a[iy] | (a[iy+1] << 8)
: a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> io) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
if (io <= 2) {
unsigned buf = a[iy] | (a[iy+1] << 8);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
} else {
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
}
The easiest solution is to use a struct of eight bitfields:
typedef struct __attribute__((__packed__)) EightValues {
uint16_t v0 : 14,
v1 : 14,
v2 : 14,
v3 : 14,
v4 : 14,
v5 : 14,
v6 : 14,
v7 : 14;
} EightValues;
This struct has a size of 14*8 = 112 bits, which is 14 bytes (seven uint16_t). Now, all you need is to use the last three bits of the array index to select the right bitfield:
uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) {
EightValues* accessPointer = (EightValues*)arr;
accessPointer += index >> 3; //select the right structure in the array
switch(index & 7) { //use the last three bits of the index to access the right bitfield
case 0: return accessPointer->v0;
case 1: return accessPointer->v1;
case 2: return accessPointer->v2;
case 3: return accessPointer->v3;
case 4: return accessPointer->v4;
case 5: return accessPointer->v5;
case 6: return accessPointer->v6;
case 7: return accessPointer->v7;
}
}
Your compiler will do the bit-fiddling for you.
The Basis for Storage Issue
The biggest issue you are facing is the fundamental question of "What is my basis for storage going to be?" You know the basics, what you have available to you is char, short, int, etc... The smallest being 8-bits. No matter how you slice your storage scheme, it will ultimately have to rest in memory in a unit of memory based on this 8 bit per byte layout.
The only optimal, no bits wasted, memory allocation would be to declare an array of char in the least common multiple of 14-bits. It is the full 112-bits in this case (7-shorts or 14-chars). This may be the best option. Here, declaring an array of 7-shorts or 14-chars, would allow the exact storage of 8 14-bit values. Of course if you have no need for 8 of them, then it wouldn't be of much use anyway as it would waste more than the 4-bits lost on a single unsigned value.
Let me know if this is something you would like to further explore. If it is, I'm happy to help with the implementation.
Bitfield Struct
The comments regarding bitfield packing or bit packing are exactly what you need to do. This can involve a structure alone or in combination with a union, or by manually right/left shifting values directly as needed.
A short example applicable to your situation (if I understood correctly you want 2 14-bit areas in memory) would be:
#include <stdio.h>
typedef struct bitarr14 {
unsigned n1 : 14,
n2 : 14;
} bitarr14;
char *binstr (unsigned long n, size_t sz);
int main (void) {
bitarr14 mybitfield;
mybitfield.n1 = 1;
mybitfield.n2 = 1;
printf ("\n mybitfield in memory : %s\n\n",
binstr (*(unsigned *)&mybitfield, 28));
return 0;
}
char *binstr (unsigned long n, size_t sz)
{
static char s[64 + 1] = {0};
char *p = s + 64;
register size_t i = 0;
for (i = 0; i < sz; i++) {
p--;
*p = (n >> i & 1) ? '1' : '0';
}
return p;
}
Output
$ ./bin/bitfield14
mybitfield in memory : 0000000000000100000000000001
Note: the dereference of mybitfield for purposes of printing the value in memory breaks strict aliasing and it is intentional just for the purpose of the output example.
The beauty, and purpose for using a struct in the manner provided is it will allow direct access to each 14-bit part of the struct directly, without having to manually shift, etc.
Update - assuming you want big endian bit packing. This is code meant for a fixed size code word. It's based on code I've used for data compression algorithms. The switch case and fixed logic helps with performance.
typedef unsigned short uint16_t;
void bit14arr_set(unsigned char* arr, unsigned int index, uint16_t value)
{
unsigned int bitofs = (index*14)%8;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
*arr++ = (unsigned char)(value >> 6);
*arr &= 0x03;
*arr |= (unsigned char)(value << 2);
break;
case 2: /* bit offset == 2 */
*arr &= 0xc0;
*arr++ |= (unsigned char)(value >> 8);
*arr = (unsigned char)(value << 0);
break;
case 4: /* bit offset == 4 */
*arr &= 0xf0;
*arr++ |= (unsigned char)(value >> 10);
*arr++ = (unsigned char)(value >> 2);
*arr &= 0x3f;
*arr |= (unsigned char)(value << 6);
break;
case 6: /* bit offset == 6 */
*arr &= 0xfc;
*arr++ |= (unsigned char)(value >> 12);
*arr++ = (unsigned char)(value >> 4);
*arr &= 0x0f;
*arr |= (unsigned char)(value << 4);
break;
}
}
uint16_t bit14arr_get(unsigned char* arr, unsigned int index)
{
unsigned int bitofs = (index*14)%8;
unsigned short value;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
value = ((unsigned int)(*arr++) ) << 6;
value |= ((unsigned int)(*arr ) ) >> 2;
break;
case 2: /* bit offset == 2 */
value = ((unsigned int)(*arr++)&0x3f) << 8;
value |= ((unsigned int)(*arr ) ) >> 0;
break;
case 4: /* bit offset == 4 */
value = ((unsigned int)(*arr++)&0x0f) << 10;
value |= ((unsigned int)(*arr++) ) << 2;
value |= ((unsigned int)(*arr ) ) >> 6;
break;
case 6: /* bit offset == 6 */
value = ((unsigned int)(*arr++)&0x03) << 12;
value |= ((unsigned int)(*arr++) ) << 4;
value |= ((unsigned int)(*arr ) ) >> 4;
break;
}
return value;
}
Here's my version (updated to fix bugs):
#define PACKWID 14 // number of bits in packed number
#define PACKMSK ((1 << PACKWID) - 1)
#ifndef ARCHBYTEALIGN
#define ARCHBYTEALIGN 1 // align to 1=bytes, 2=words
#endif
#define ARCHBITALIGN (ARCHBYTEALIGN * 8)
typedef unsigned char byte;
typedef unsigned short u16;
typedef unsigned int u32;
typedef long long s64;
typedef u16 pcknum_t; // container for packed number
typedef u32 acc_t; // working accumulator
#ifndef ARYOFF
#define ARYOFF long
#endif
#define PRT(_val) ((unsigned long) _val)
typedef unsigned ARYOFF aryoff_t; // bit offset
// packary -- access array of packed numbers
// RETURNS: old value
extern inline pcknum_t
packary(byte *ary,aryoff_t idx,int setflg,pcknum_t newval)
// ary -- byte array pointer
// idx -- index into array (packed number relative)
// setflg -- 1=set new value, 0=just get old value
// newval -- new value to set (if setflg set)
{
aryoff_t absbitoff;
aryoff_t bytoff;
aryoff_t absbitlhs;
acc_t acc;
acc_t nval;
int shf;
acc_t curmsk;
pcknum_t oldval;
// get the absolute bit number for the given array index
absbitoff = idx * PACKWID;
// get the byte offset of the lowest byte containing the number
bytoff = absbitoff / ARCHBITALIGN;
// get absolute bit offset of first containing byte
absbitlhs = bytoff * ARCHBITALIGN;
// get amount we need to shift things by:
// (1) our accumulator
// (2) values to set/get
shf = absbitoff - absbitlhs;
#ifdef MODSHOW
do {
static int modshow;
if (modshow > 50)
break;
++modshow;
printf("packary: MODSHOW idx=%ld shf=%d bytoff=%ld absbitlhs=%ld absbitoff=%ld\n",
PRT(idx),shf,PRT(bytoff),PRT(absbitlhs),PRT(absbitoff));
} while (0);
#endif
// adjust array pointer to the portion we want (guaranteed to span)
ary += bytoff * ARCHBYTEALIGN;
// fetch the number + some other bits
acc = *(acc_t *) ary;
// get the old value
oldval = (acc >> shf) & PACKMSK;
// set the new value
if (setflg) {
// get shifted mask for packed number
curmsk = PACKMSK << shf;
// remove the old value
acc &= ~curmsk;
// ensure caller doesn't pass us a bad value
nval = newval;
#if 0
nval &= PACKMSK;
#endif
nval <<= shf;
// add in the value
acc |= nval;
*(acc_t *) ary = acc;
}
return oldval;
}
pcknum_t
int_get(byte *ary,aryoff_t idx)
{
return packary(ary,idx,0,0);
}
void
int_set(byte *ary,aryoff_t idx,pcknum_t newval)
{
packary(ary,idx,1,newval);
}
Here are benchmarks:
set: 354740751 7.095 -- gene
set: 203407176 4.068 -- rcgldr
set: 298946533 5.979 -- craig
get: 268574627 5.371 -- gene
get: 166839767 3.337 -- rcgldr
get: 207764612 4.155 -- craig

How to circular shift an array of 4 chars?

I have an array of four unsigned chars. I want to treat it like a 32-bit number (assume the upper bits of the char are don't care. I only care about the lower 8-bits). Then, I want to circularly shift it by an arbitrary number of places. I've got a few different shift sizes, all determined at compile-time.
E.g.
unsigned char a[4] = {0x81, 0x1, 0x1, 0x2};
circular_left_shift(a, 1);
/* a is now { 0x2, 0x2, 0x2, 0x5 } */
Edit: To everyone wondering why I didn't mention CHAR_BIT != 8, because this is standard C. I didn't specify a platform, so why are you assuming one?
static void rotate_left(uint8_t *d, uint8_t *s, uint8_t bits)
{
const uint8_t octetshifts = bits / 8;
const uint8_t bitshift = bits % 8;
const uint8_t bitsleft = (8 - bitshift);
const uint8_t lm = (1 << bitshift) - 1;
const uint8_t um = ~lm;
int i;
for (i = 0; i < 4; i++)
{
d[(i + 4 - octetshifts) % 4] =
((s[i] << bitshift) & um) |
((s[(i + 1) % 4] >> bitsleft) & lm);
}
}
Obviously
While keeping in mind plain C the best way is
inline void circular_left_shift(char *chars, short shift) {
__int32 *dword = (__int32 *)chars;
*dword = (*dword << shift) | (*dword >> (32 - shift));
}
Uhmm, char is 16 bits long, was not clear for me. I presume int is still 32 bit.
inline void circular_left_shift(char *chars, short shift) {
int i, part;
part = chars[0] >> (16 - shift);
for (i = 0; i < 3; ++i)
chars[i] = (chars[i] << shift) | (chars[i + 1] >> (16 - shift));
chars[3] = (chars[3] << shift) | part;
}
Or you could just unwind this cycle.
You could dig further into asm instruction ror, on x86 it's capable of performing such shift up to 31 bits left. Something like a
MOV CL, 31
ROR EAX, CL
Use union:
typedef union chr_int{
unsigned int i;
unsigned char c[4];
};
It's safer (because of pointer aliasing) and easier to manipulate.
EDIT: you should have mention earlier that your char isn't 8 bits. However, this should do the trick:
#define ORIG_MASK 0x81010102
#define LS_CNT 1
unsigned char a[4] = {
((ORIG_MASK << LS_CNT ) | (ORIG_MASK >> (32 - LS_CNT))) & 0xff,
((ORIG_MASK << (LS_CNT + 8)) | (ORIG_MASK >> (24 - LS_CNT))) & 0xff,
((ORIG_MASK << LS_CNT + 16)) | (ORIG_MASK >> (16 - LS_CNT))) & 0xff,
((ORIG_MASK << (LS_CNT + 24)) | (ORIG_MASK >> ( 8 - LS_CNT))) & 0xff
};

Resources