Finding position of '1's efficiently in an bit array

Finding position of '1's efficiently in an bit array - c

I'm wiring a program that tests a set of wires for open or short circuits. The program, which runs on an AVR, drives a test vector (a walking '1') onto the wires and receives the result back. It compares this resultant vector with the expected data which is already stored on an SD Card or external EEPROM.
Here's an example, assume we have a set of 8 wires all of which are straight through i.e. they have no junctions. So if we drive 0b00000010 we should receive 0b00000010.
Suppose we receive 0b11000010. This implies there is a short circuit between wire 7,8 and wire 2. I can detect which bits I'm interested in by 0b00000010 ^ 0b11000010 = 0b11000000. This tells me clearly wire 7 and 8 are at fault but how do I find the position of these '1's efficiently in an large bit-array. It's easy to do this for just 8 wires using bit masks but the system I'm developing must handle up to 300 wires (bits). Before I started using macros like the following and testing each bit in an array of 300*300-bits I wanted to ask here if there was a more elegant solution.
#define BITMASK(b) (1 << ((b) % 8))
#define BITSLOT(b) ((b / 8))
#define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
#define BITCLEAR(a,b) ((a)[BITSLOT(b)] &= ~BITMASK(b))
#define BITTEST(a,b) ((a)[BITSLOT(b)] & BITMASK(b))
#define BITNSLOTS(nb) ((nb + 8 - 1) / 8)
Just to further show how to detect an open circuit. Expected data: 0b00000010, received data: 0b00000000 (the wire isn't pulled high). 0b00000010 ^ 0b00000000 = 0b0b00000010 - wire 2 is open.
NOTE: I know testing 300 wires is not something the tiny RAM inside an AVR Mega 1281 can handle, that is why I'll split this into groups i.e. test 50 wires, compare, display result and then move forward.

Many architectures provide specific instructions for locating the first set bit in a word, or for counting the number of set bits. Compilers usually provide intrinsics for these operations, so that you don't have to write inline assembly. GCC, for example, provides __builtin_ffs, __builtin_ctz, __builtin_popcount, etc., each of which should map to the appropriate instruction on the target architecture, exploiting bit-level parallelism.
If the target architecture doesn't support these, an efficient software implementation is emitted by the compiler. The naive approach of testing the vector bit by bit in software is not very efficient.
If your compiler doesn't implement these, you can still code your own implementation using a de Bruijn sequence.

How often do you expect faults? If you don't expect them that often, then it seems pointless to optimize the "fault exists" case -- the only part that will really matter for speed is the "no fault" case.
To optimize the no-fault case, simply XOR the actual result with the expected result and a input ^ expected == 0 test to see if any bits are set.
You can use a similar strategy to optimize the "few faults" case, if you further expect the number of faults to typically be small when they do exist -- mask the input ^ expected value to get just the first 8 bits, just the second 8 bits, and so on, and compare each of those results to zero. Then, you just need to search for the set bits within the ones that are not equal to zero, which should narrow the search space to something that can be done pretty quickly.

You can use a lookup table. For example log-base-2 lookup table of 255 bytes can be used to find the most-significant 1-bit in a byte:
uint8_t bit1 = log2[bit_mask];
where log2 is defined as follows:
uint8_t const log2[] = {
0, /* not used log2[0] */
0, /* log2[0x01] */
1, 1 /* log2[0x02], log2[0x03] */
2, 2, 2, 2, /* log2[0x04],..,log2[0x07] */
3, 3, 3, 3, 3, 3, 3, 3, /* log2[0x08],..,log2[0x0F */
...
}
On most processors a lookup table like this will go to ROM. But AVR is a Harvard machine and to place data in code space (ROM) requires special non-standard extension, which depends on the compiler. For example the IAR AVR compiler would need use the extended keyword __flash. In WinAVR (GNU AVR) you would need to use the PROGMEM attribute, but it's more complex than that, because you would also need to use special macros to to read from the program space.

I think there is only one way to do this:
Create an array out "outdata". Each item of the array can for example correspond an 8-bit port register.
Send the outdata on the wires.
Read back this data as "indata".
Store the indata in an array mapped exactly as the outdata.
In a loop, XOR each byte of outdata with each byte of indata.
I would strongly recommend inline functions instead of those macros.
Why can't your MCU handle 300 wires?
300/8 = 37.5 bytes. Rounded to 38. It needs to be stored twice, outdata and indata, 38*2 = 76 bytes.
You can't spare 76 bytes of RAM?

I think you're missing the forest through the trees. Seems like a bed of nails test. First test some assumptions:
1) You know which pins should be live for each pin tested/energized.
2) you have a netlist translated for step 1 into a file on sd
If you operate on a byte level as well as bit, it simplifies the issue. If you energize a pin, there is an expected pattern out stored in your file. First find the mismatched bytes; identify mismatched pins in the byte; finally store the energized pin with the faulty pin numbers.
You don't need an array for searching, or results. general idea:
numwires=300;
numbytes=numwires/8 + (numwires%8)?1:0;
for(unsigned char currbyte=0; currbyte<numbytes; currbyte++)
{
unsigned char testbyte=inchar(baseaddr+currbyte)
unsigned char goodbyte=getgoodbyte(testpin,currbyte/*byte offset*/);
if( testbyte ^ goodbyte){
// have a mismatch report the pins
for(j=0, mask=0x01; mask<0x80;mask<<=1, j++){
if( (mask & testbyte) != (mask & goodbyte)) // for clarity
logbadpin(testpin, currbyte*8+j/*pin/wirevalue*/, mask & testbyte /*bad value*/);
}
}

Related

Correct way to unpack a 32 bit vector in Perl to read a uint32 written in C

I am parsing a Photoshop raw, 16 bit/channel, RGB file in C and trying to keep a log of exceptional data points. I need a very fast C analysis of up to 36 MPix images with 16 bit quanta or 216 MB Photoshop .RAW files.
<1% of the points have weird skin tones and I want to graph them with PerlMagick or Perl GD to see where they are coming from.
The first 4 bytes of the C data file contain the unsigned image width as a uint32_t. In Perl, I read the whole file in binary mode and extract the first 32 bits:
Xres=1779105792l = 0x6a0b0000
It looks a lot like the C log file:
DA: Color anomalies=14177=0.229%:
DA: II=1) raw PIDX=0x10000b25, XCols=[0]=0x00000b6a
Dec(0x00000b6a) = 2922, the Exact X_Columns_Width of a small test file.
Clearly a case of intel's 1972 8008 NUXI architecture. How hard could it possibly be to translate 0x6a0b0000 to 0x6a0b0000; swap 2 bytes and 2 nibbles and you're done. Slicing the 8 characters and rearranging them could be done but that is the kind of ugly hack I am trying to avoid.
Grab the same 32 bit vector from file offset zero and unpack it as "VAX" unsigned long.
$xres = vec($bdat, 0, 32); # vec EXPR,OFFSET,BITS
$vul = unpack("V", vec($bdat, 0, 32));
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731
Every single hex character is mangled. Obviously wrong Endian, it is not VAX. The "Other" one is Network Big-endian
http://perldoc.perl.org/functions/pack.html
N An unsigned long (32-bit) in "network" (big-endian) order.
V An unsigned long (32-bit) in "VAX" (little-endian) order.
$nul = unpack("N", vec($bdat, 0, 32)); # Network Unsigned Long 32b
printf("Xres=0x%08x, NET ulong=%ul=0x%08x\n", $xres, $nul, $nul);
Xres=0x6a0b0000, NET ulong=825702201l=0x31373739
The $XRES still shows the right hex in the wrong order. The "NETWORK" long 32 bit uint extracted from the same bits is unrecognizable. Try Binary
$bits = unpack("b*", vec($bdat, 0, 32));
printf("bits=$bits, len=%d\n", length $bits);
bits=10001100111011001110110010011100100011000000110010101100111011001001110001001100, len=80
I clearly asked for 32 bits and got 80 bits. What gives?
Try for 4, unsigned, 8bit bytes which can NOT be swapped:
for($ii = 0; $ii < 4; $ii++) {
$bit_off=$ii*8; # Bit offset
$uc = unpack("C", vec($bdat, $bit_off, 8)); # C An unsigned char
printf("II $ii, bo $bit_off, d=%d, u=%u, x=0x%x\n",
$uc,$uc, $uc);
}
II 0, bo 0, d=49, u=49, x=0x31
II 1, bo 8, d=51, u=51, x=0x33
II 2, bo 16, d=49, u=49, x=0x31
II 3, bo 24, d=49, u=49, x=0x31
I am looking for hex 0, 6, a or b. There are no "3"s or "1"s in the right answer. Try pirating from a C file:
http://cpansearch.perl.org/src/MHX/Convert-Binary-C-0.76/tests/include/include/bits/byteswap.h
$x = $xres;
$x= (((($x) & 0xff000000) >> 24) | ((($x) & 0x00ff0000) >> 8) | ((($x) & 0x0000ff00) << 8) | ((($x) & 0x000000ff) << 24));
printf("\$xres=0x%08x -> \$x=0x%08x = %u\n", $xres, $x, $x);
$xres=0x6a0b0000 -> $x=0x00000b6a = 2922
It WORKS! But, this is uglier than converting the original, wrong order hex number to a string to untangle it:
$stupid_str = sprintf("%08x", $xres);
$stupid_num = join('', reverse ($stupid_str =~ m/../g));
printf("Stupid_num '%s'->0x%08x=%d\n", $stupid_num, $dec=hex $stupid_num, $dec);
Stupid_num '00000b6a'->0x00000b6a=2922
It's like judging the Ugliest Dog contest, but I would still rather have to maintain the text version than the even more abominable C version.
I know there are ways to do this in Java/Python/Go/Ruby/.....
I know there are command line utilities that do exactly this.
I must figure out how I am misusing either VEC or Unpack, both of which I have used a zillion times. It is the Brain Teasing aspect which is driving me nuts! EndianNess == EndianMess!!!
TYVM!
=================================================
Borodin,
Thanks for lookin' at this.
My intel processor is little-endian. When I read it back, it was trans-mutilated by vec to the "correct" big-endian, network format.
I just tried reading it VERBATIM from a BINARY file read and it works fine:
($b4 = $bdat) =~ s/^(....).*$/$1/msg; # Give me my 4 bytes back without mutilation!
printf("B4='%s'=>0x%08x=<0x%08x\n", $b4, unpack("L>", $b4), unpack("L<", $b4));
B4='j...' = >0x6a0b0000 = <0x00000b6a <<< THE RIGHT ANSWER!!!
If you try unpack 'V', $bdat then you will find that it works
That was my first attempt:
$vul = unpack("V", vec($bdat, 0, 32)); # UNPACK V!
printf("Length (\$bdat)=%d, xres=0x%08x, Vax ulong=%ul=0x%08x\n",
length($bdat), $xres, $vul, $vul);
Length ($bdat) = 56712, xres=0x6a0b0000, Vax ulong=959919921l=0x39373731 <<<< TOTALLY WRONG!
I had already verified that the $BDAT info was the right data in the wrong format. It just needed some rearrangement.
I just used vec() to generate 1 bit and 4 bit graphics files and it worked faithfully, returning the exact bits I wrote. It must have mistaken my Intel i7 for my IBM System/370. I7/37??? Easy mistake to make. :)
I read the [confusing] part about "converted to a number as with pack ...". That's why my number was backward. The >>unpack("V", vec($bdat"<< ... was my ill-fated attempt to byte-swap the backward number in $BDAT from the WRONG VEC()-preferred FORMAT to the native format supported by my architecture.
Now I understand why I saw so many examples of people extracting by the byte, to avoid Big Brother's helping hand!
Data::BitStream::Vec "uses a Perl vec to store the data. The vector is accessed in 1-bit units"
Thanks 1E6,
B

You are confusing things by combining vec with unpack
The correct way is simply
unpack 'V', $bdat
which returns a value of 0x00000B6A as you expect
vec($bdat, 0, 32) is equivalent to unpack 'N', $bdat as you can see from the value of $xres in your first code block, and the documentation for vec confirms this with
If BITS is 16 or more, bytes of the input string are grouped into chunks of size BITS/8, and each group is converted to a number as with pack()/unpack() with big-endian formats n/N
The line
$vul = unpack("V", vec($bdat, 0, 32))
is very wrong, because the decimal value of vec($bdat, 0, 32) is 1779105792, so you are then calling unpack on the string "1779105792" which doesn't do anything useful at all

How can I deal with given situtaion related to Hardware change

I am maintaining a Production code related to FPGA device .Earlier resisters on FPGA are of 32 bits and read/write to these registers are working fine.But Hardware is changed and so did the FPGA device and with latest version of FPGA device we have trouble in read and write to FPGA register .After some R&D we came to know FPGA registers are no longer 32 bit ,it is now 31 bit registers and same has been claimed by FPGA device vendor.
So there is need to change small code as well.Earlier we were checking that address of registers are 4 byte aligned or not(because registers are of 32 bits)now with current scenario we have to check address are 31 bit aligned.So for the same we are going to check
if the most significant bit of the address is set (which means it is not a valid 31 bit).
I guess we are ok here.
Now second scenario is bit tricky for me.
if read/write for multiple registers that is going to go over the 0x7fff-fffc (which is the maximum address in 31 bit scheme) boundary, then have to handle request carefully.
Reading and Writing for multiple register takes length as an argument which is nothing but number of register to be read or write.
For example, if the read starts with 0x7fff-fff8, and length for the read is 5. Then actually, we can only read 2 registers (which is 0x7fff-fff8, and 0x7fff-fffc).
Now could somebody suggest me some kind of pseudo code to handle this scenario
Some think like below
while(lenght>1)
{
if(!(address<<(lenght*31) <= 0x7fff-fffc))
{
length--;
}
}
I know it is not good enough but something in same line which I can use.
EDIT
I have come up with a piece of code which may fulfill my requirement
int count;
Index_addr=addr;
while(Index_add <= 7ffffffc)
{
/*Wanted to move register address to next register address,each register is 31 bit wide and are at consecutive location. like 0x0,0x4 and 0x8 etc.*/
Index_add=addr<<1; // Guess I am doing wrong here ,would anyone correct it.
count++;
}
length=count;

The root problem seems to be that the program is not properly treating the FPGA registers.
Data encapsulation would help, and, instead of treating the 31-bit FPGA registers as memory locations, they should be abstracted.
The FPGA should be treated as a vector (a one-dimensional array) of registers.
The vector of N FPGA registers should be addressable by an register index in the range of 0x0000 through N-1.
The FPGA registers are memory mapped at base addr.
So the memory address = 4 * FPGA register index + base addr.
Access to the FPGA registers should be encapsulated by read and write procedures:
int read_fpga_reg(int reg_index, uint32_t *reg_valp)
{
if (reg_index < 0 || reg_index >= MAX_REG_INDEX)
return -1; /* error return */
*reg_valp = *(uint32_t *)(reg_index << 2 + fpga_base_addr);
return 0;
}
As long as MAX_REG_INDEX and fpga_base_addr are properly defined, then this code will never generate an invalid memory access.

I'm not absolutely sure I'm interpreting the given scenario correctly. But here's a shot at it:
// Assuming "address" starts 4-byte aligned and is just defined as an integer
unsigned uint32_t address; // (Assuming 32-bit unsigned longs)
while ( length > 0 ) // length is in bytes
{
// READ 4-byte value at "address"
// Mask the read value with 0x7FFFFFFF since there are 31 valid bits
// 32 bits (4 bytes) have been read
if ( (--length > 0) && (address < 0x7ffffffc) )
address += 4;
}

Emulation Implementing CPU instructions?

I'm trying to learn emulation programming. I've done a CHIP-8 emulator, Under 40 instructions, and lived because of my music. I'm now hoping to do something A bit more complex, like an SNES. The problem I'm encountering is the sheer number of CPU instructions. Looking through the wiki.SuperFamicom.org 65c816 instruction listing, It look's like a pain in the rear. And I've seen notes here and there on various internet pages that the CPU is the easyest part of an emulator to impliment.
Under the assumption that it was so hard because I was doing it wrong, I looked around and found a simple implimentation: SNES Emulator in 15 minutes which is about 900 lines of code. Easy enough to work through.
So then, from the SNES Emulator in 15 minutes Source, I found where the CPU instructions are. It look's a lot simpler than what I was thinking. I dont really understand it, but it's a few lines of code as opposed to a large mass of code. First thing I notice is that the instructions only have 1 implimentation each. If you look at the table in SuperFamicom then you see that it has
ADC #const
ADC (_db_),X
ADC (_db_,X)
ADC addr
ADC long
...
And The emulator source for (I think) ALL of those is:
// Note: op 0x100 means "NMI", 0x101 means "Reset", 0x102 means "IRQ". They are implemented in terms of "BRK".
// User is responsible for ensuring that WB() will not store into memory while Reset is being processed.
unsigned addr=0, d=0, t=0xFF, c=0, sb=0, pbits = op<0x100 ? 0x30 : 0x20;
// Define the opcode decoding matrix, which decides which micro-operations constitute
// any particular opcode. (Note: The PLA of 6502 works on a slightly different principle.)
const unsigned o8 = op / 32, o8m = 1u << (op%32);
// Fetch op'th item from a bitstring encoded in a data-specific variant of base64,
// where each character transmits 8 bits of information rather than 6.
// This peculiar encoding was chosen to reduce the source code size.
// Enum temporaries are used in order to ensure compile-time evaluation.
#define t(w8,w7,w6,w5,w4,w3,w2,w1,w0) if( \
(o8<1?w0##u : o8<2?w1##u : o8<3?w2##u : o8<4?w3##u : \
o8<5?w4##u : o8<6?w5##u : o8<7?w6##u : o8<8?w7##u : w8##u) & o8m)
t(0,0xAAAAAAAA,0x00000000,0x00000000,0x00000000,0xAAAAA2AA,0x00000000,0x00000000,0x00000000) { c = t; t += A + P.C; P.V = (c^t) & (A^t) & 0x80; P.C = t & 0x100; }
In short, my General question:
Condensing the phenomenal cosmic power of CPU instructions into an itty bitty piece of code
Questions specific to the SNES emulator in 15 minutes source (portion posted above):
How does t(0, 0xAAAAAAAA, 0x00000000, ....) parse the instruction? I see the if statment, but I dont know where the number's for any of the arguments come from, or what they mean to the overall code.
Why o8 = op / 32 and o8m = 1u << (op%32)?
The opcodes for ADC has ADC #const which has a 2 byte operand, or ADC addr which has a 3 byte operand. And the code t(0, 0xAAAAAAAA, ...) impliments both cases?
While I'm asking:
what do the dp, _dp_ and sr that appear in ADC dp, ADC (_dp_) and ADC sr,S mean?
what is the difference between ADC (_dp_,X) and ADC dp,X? (probably redundand given the question above.)

I can't answer all of this, but dp stands for Direct Page, meaning that the instruction takes a single-byte operand which is a memory address within the Direct Page. Direct Page addressing is an extension of the Zero Page addressing mode of the 6502, where the single-byte addresses referred to memory locations $00 through $FF. The 16-bit derivatives of the 6502 have a configuration register which basically relocates the Zero Page to an alternate location.f
In the wiki page you linked to, some of the dp in the table have underscores on them, and the others are in italics. I assume that they are all intended to be italic, and the wiki markup isn't working. A quick check of the Edit link supports this assumption (in the wiki source, they all have underscores). So don't read anything into that.
In 6502 assembly and derivatives of it, ADC dp,X means... let's take a concrete example instead... ADC $10,X means to add $10 to the value in register X to obtain an address, then load a value from that address and add it to the accumulator. ADC ($10,X) adds an extra level of indirection: add $10 to X to obtain an address, load a value from that address, interpret the loaded value as another address, and load the value from that address and add it to the accumulator. Parenthesized operands always add a level of indirection.
Note that the available modes include (dp,X) and (dp),Y and the placement of the parentheses relative to the comma and register is significant. With (dp),Y the value of Y is added to the first loaded value to get the address to use in the second load.
As for that emulator... code golf doesn't lead to enhanced readability! I don't think the portion you've posted is actually understandable by itself, and I don't feel like tracking down and reading the rest of it. But the key concept in the t macro is bitstring. Its arguments are a series of 9 bitmasks, each 32 bits long, for a total of 288 bits. Every possible opcode (256 of them), plus the 3 pseudo-opcodes mentioned in the first comment, is therefore represented by a single bit in this 288-bit-long bitstring, with 29 bits left over.
That explains the construction of o8 and o8m. The 8-bit value is split into a 3-bit portion (to select an argument from the 8 arguments supplied to t) and a 5-bit portion (to select a single bit from the selected argument). The big ?: chain does the first selection and the combination of & and 1 << ... does the select selection.
And then, oh look we have a variable called t too. It's not related to the macro. Giving them the same name was just cruel.
Maybe I can figure out what that bitstring is doing. When the opcode is a low number, o8 (the high bits) will be 0, so the ?: chain will use w0, which is the last argument to the macro. As the opcode increases, the selected argument moves leftward through the argument list to w1, then w2... and the o8m selector likewise starts at the right and moves left (& (1<<0) is the rightmost bit, & (1<<1) is the next one, etc.) and the if condition will be true when the selected bit is 1. Values are:
0, # opcodes $100 and up
0xAAAAAAAA, # opcodes $E0 to $FF
0x00000000, # opcodes $C0 to $DF
0x00000000, # opcodes $A0 to $BF
0x00000000, # opcodes $80 to $9F
0xAAAAA2AA, # opcodes $60 to $7F
0x00000000, # opcodes $40 to $5F
0x00000000, # opcodes $20 to $3F
0x00000000 # opcodes $00 to $1F
or in binary
0, # opcodes $100 and up
0b10101010101010101010101010101010, # opcodes $E0 to $FF
0b00000000000000000000000000000000, # opcodes $C0 to $DF
0b00000000000000000000000000000000, # opcodes $A0 to $BF
0b00000000000000000000000000000000, # opcodes $80 to $9F
0x10101010101010101010001010101010, # opcodes $60 to $7F
0b00000000000000000000000000000000, # opcodes $40 to $5F
0b00000000000000000000000000000000, # opcodes $20 to $3F
0b00000000000000000000000000000000 # opcodes $00 to $1F
Reading each line from right to left, the 1's are in positions corresponding to these opcodes: $61 $63 $65 $67 $69 $6D $6F $71 $73 $75 $77 $79 $7B $7D $7F $E1 $E3 $E5 $E7 $E9 $EB $ED $EF $F1 $F3 $F5 $F7 $F9 $FB $FD $FF
Hmm... that sort of resembles the list of ADC and SBC opcodes, but some of them are wrong.
Oh (I finally gave up and looked at some more of the emulator code) that's a NES emulator, not a SNES emulator, so it only has 6502 opcodes.

How do I convert a G.726 ADPCM signal into a PCM signal?

I usually look to SoX or Window's built in audio libraries for this stuff, but it appears that neither have G.726 codecs.
So I have a sequence of bytes that I know are encoded as G.726 although the bit-rate and whether it is mu-law or A-law is not known at this time (experimentation will determine those parameters), and I need to decode them into a normal PCM signal.
So I downloaded the reference implementation from the ITU-T (ITU-T Recommendation G.191) but I'm kind of confused on how to use the G726_decode function. According to the documentation inp_buf and out_buf need to have the same length smpno and both buffers are 16-bit buffers. This seems to me like a step is missing; otherwise no compression is accomplished by using G.726. According to the Wikipedia page on G.726 sample size depends on bit rate (from 2 to 5 bits). Am I supposed to do the decompression into samples myself? So if I assume maximum compression (2 bit samples) then each byte will produce 4 samples.
Example:
char b = /* read the code from input */
short inp[4], output[4];
inp[0] = b & 0x0003;
inp[1] = b & 0x000C >> 2;
inp[2] = (b & 0x0030) >> 4;
inp[3] = (b & 0x00C0) >> 6;
G726_state state;
memset(&state, 0, sizeof(G726_state));
G726_decode(inp, output, 4, "u", 2, 1, &state);
/* ouput now contains 4 PCM samples */
Or am I missing something completely?

Looks like ffmpeg actually isn't able to do this, as I thought it surely would be able to... however, while I was googling I did find this post to the ffmpeg mailing list which offers a solution.
Basically, there is a separate program called g72x++ which seems to be able to decode the audio to raw PCM for you.

Bit manipulation library for ANSI C

Does anyone knows a good bit manipulation library for ANSI C?
What I basically need, is ability, like in Jovial to set specific bits in a variable, something like
// I assume LSB has index of 0
int a = 0x123;
setBits(&a,2,5, 0xFF);
printf("0x%x"); // should be 0x13F
int a = 0x123;
printf("0x%x",getBits(&a,2,5)); // should be 0x4
char a[] = {0xCC, 0xBB};
char b[] = {0x11, 0x12};
copyBits(a,/*to=*/4,b,/*from=*/,4,/*lengthToCopy=*/8);
// Now a == {0x1C, 0xB2}
There's a similar library called bitfile, but it doesn't seem to support direct memory manipulation. It only supports feeding bits to file streams.
It's not hard to write, but if there's something tested - I won't reinvent the wheel.
Maybe this library exists as a part of bigger library (bzip2, gzip are the usual suspects)?

I think is considered "too simple" for a library; most functions would only be a statement or two, which would make the overhead of calling a library function a bit more than typical C programmers tolerate. :)
That said, the always-excellent glib has two of the more complicated bit-oriented functions: g_bit_nth_lsf() and g_bit_nth_msf(). These are used to find the index of the first bit set, searching from the lowest or the highest bit, respectively.

This seems to be the problem I was tackling in my question
Algorithm for copying N bits at arbitrary position from one int to another
There are several different alternatives provided, with the fastest being the assembly solution by fnieto.

You will come a long way with the following macros:
#define SETBITS(mem, bits) (mem) |= (bits)
#define CLEARBITS(mem, bits) (mem) &= ~(bits)
#define BIN(b7,b6,b5,b4, b3,b2,b1,b0) \
(unsigned char)( \
((b7)<<7) + ((b6)<<6) + ((b5)<<5) + ((b4)<<4) + \
((b3)<<3) + ((b2)<<2) + ((b1)<<1) + ((b0)<<0) \
)
Then you can write
int a = 0x123;
SETBITS(a, BIN(0,0,0,1, 1,1,1,0));
printf("0x%x", a); // should be 0x13F

Maybe the algorithms from the "FXT" book (link at the bottom of the page) will be useful.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight