Reading characters on a bit level - c

I would like to be able to enter a character from the keyboard and display the binary code for said key in the format 00000001 for example.
Furthermore i would also like to read the bits in a way that allows me to output if they are true or false.
e.g.
01010101 = false,true,false,true,false,true,false,true
I would post an idea of how i have tried to do it myself but I have absolutely no idea, i'm still experimenting with C and this is my first taste of programming at such a low level scale.
Thankyou

For bit tweaking, it is often safer to use unsigned types, because shifts of signed negative values have an implementation-dependent effect. The plain char can be either signed or unsigned (traditionally, it is unsigned on MacIntosh platforms, but signed on PC). Hence, first cast you character into the unsigned char type.
Then, your friends are the bitwise boolean operators (&, |, ^ and ~) and the shift operators (<< and >>). For instance, if your character is in variable x, then to get the 5th bit you simply use: ((x >> 5) & 1). The shift operators moves the value towards the right, dropping the five lower bits and moving the bit your are interested in the "lowest position" (aka "rightmost"). The bitwise AND with 1 simply sets all other bits to 0, so the resulting value is either 0 or 1, which is your bit. Note here that I number bits from left significant (rightmost) to most significant (leftmost) and I begin with zero, not one.
If you assume that your characters are 8-bits, you could write your code as:
unsigned char x = (unsigned char)your_character;
int i;
for (i = 7; i >= 0; i --) {
if (i != 7)
printf(",");
printf("%s", ((x >> i) & 1) ? "true" : "false");
}
You may note that since I number bits from right to left, but you want output from left to right, the loop index must be decreasing.
Note that according to the C standard, unsigned char has at least eight bits but may have more (nowadays, only a handful of embedded DSP have characters which are not 8-bit). To be extra safe, add this near the beginning of your code (as a top-level declaration):
#include <limits.h>
#if CHAR_BIT != 8
#error I need 8-bit bytes!
#endif
This will prevent successful compilation if the target system happens to be one of those special embedded DSP. As a note on the note, the term "byte" in the C standard means "the elementary memory unit which correspond to an unsigned char", so that, in C-speak, a byte may have more than eight bits (a byte is not always an octet). This is a traditional source of confusion.

This is probably not the safest way - no sanity/size/type checks - but it should still work.
unsigned char myBools[8];
char myChar;
// get your character - this is not safe and you should
// use a better method to obtain input...
// cin >> myChar; <- C++
scanf("%c", &myChar);
// binary AND against each bit in the char and then
// cast the result. anything > 0 should resolve to 'true'
// and == 0 to 'false', but you could add a '> 1' check to be sure.
for(int i = 0; i < 8; ++i)
{
myBools[i] = ( (myChar & (1 << i) > 0) ? 1 : 0 );
}
This will give you an array of unsigned chars - either 0 or 1 (true or false) - for the character.

This code is C89:
/* we need this to use exit */
#include <stdlib.h>
/* we need this to use CHAR_BIT */
#include <limits.h>
/* we need this to use fgetc and printf */
#include <stdio.h>
int main() {
/* Declare everything we need */
int input, index;
unsigned int mask;
char inputchar;
/* an array to store integers telling us the values of the individual bits.
There are (almost) always 8 bits in a char, but it doesn't hurt to get into
good habits early, and in C, the sizes of the basic types are different
on different platforms. CHAR_BIT tells us the number of bits in a byte.
*/
int bits[CHAR_BIT];
/* the simplest way to read a single character is fgetc, but note that
the user will probably have to press "return", since input is generally
buffered */
input = fgetc(stdin);
printf("%d\n", input);
/* Check for errors. In C, we must always check for errors */
if (input == EOF) {
printf("No character read\n");
exit(1);
}
/* convert the value read from type int to type char. Not strictly needed,
we can examine the bits of an int or a char, but here's how it's done.
*/
inputchar = input;
/* the most common way to examine individual bits in a value is to use a
"mask" - in this case we have just 1 bit set, the most significant bit
of a char. */
mask = 1 << (CHAR_BIT - 1);
/* this is a loop, index takes each value from 0 to CHAR_BIT-1 in turn,
and we will read the bits from most significant to least significant. */
for (index = 0; index < CHAR_BIT; ++index) {
/* the bitwise-and operator & is how we use the mask.
"inputchar & mask" will be 0 if the bit corresponding to the mask
is 0, and non-zero if the bit is 1. ?: is the ternary conditional
operator, and in C when you use an integer value in a boolean context,
non-zero values are true. So we're converting any non-zero value to 1.
*/
bits[index] = (inputchar & mask) ? 1 : 0;
/* output what we've done */
printf("index %d, value %u\n", index, inputchar & mask);
/* we need a new mask for the next bit */
mask = mask >> 1;
}
/* output each bit as 0 or 1 */
for (index = 0; index < CHAR_BIT; ++index) {
printf("%d", bits[index]);
}
printf("\n");
/* output each bit as "true" or "false" */
for (index = 0; index < CHAR_BIT; ++index) {
printf(bits[index] ? "true" : "false");
/* fiddly part - we want a comma between each bit, but not at the end */
if (index != CHAR_BIT - 1) printf(",");
}
printf("\n");
return 0;
}
You don't necessarily need three loops - you could combine them together if you wanted, and if you're only doing one of the two kinds of output, then you wouldn't need the array, you could just use each bit value as you mask it off. But I think this keeps things separate and hopefully easier to understand.

Related

Find number of bits in a data type

I need to write a macro named CountBitsM. this macro has one parameter and produces a value of type int. The parameter is any expression with an object data type or the literal name of any object data type, so i used int. This macro determines the number of bits of storage used for the data type on any machine in which its run. And i can use a macro from limits.h. Here is what i wrote, does this look right?
#ifndef COUNTBITSM_H
#define COUNTBITSM_H
#include <limits.h>
#define CountBitsM(int) ((int)*(CHAR_BIT))
#endif
Second question was to create a function CountIntBitsF that counts the number of bits used to represent a type int value on any machine. However, i can NOT USE any #define, or #include header files, or any macro. I also can not use any multiplications or divisions. The hint that was given was to start with a value of 1 in a type unsigned int variable and left-shift it one bit at a time, keeping count of number of shifts, until the variables value becomes 0. Here is what i have so far:
int CountIntBitsF(void)
{
int IntgMax = 8;
unsigned int count = 1;
while (IntgMax = IntgMax>>2) count++;
return count;
}
First off, i am not supposed to use division or multiplication so am i doing the shift properly? And i cant assume a char/byte contains 8 or any other specific number of bits. So how or what should i set my IntgMax to? Thanks for any help. I am new to C.
Macro for Bits in a Type
A macro to produce the number of bits used to represent a type in storage is:
#define CountBitsM(x) (sizeof (x) * CHAR_BIT)
However, this produces a result with type size_t (usually). If you really need an int result as stated in the question, convert it (but be aware overflow becomes possible):
#define CountBitsM(x) ((int) (sizeof (x) * CHAR_BIT))
Counting Bits
The second question asks to count the number of bits “to represent a type int value” by shifting bits in an unsigned value. There are two theoretical problems here. One is that the number of bits used to represent a value may including padding bits, and counting the bits by shifting a 1 through them only counts the value bits, not the padding bits. The second is that an int may have more padding bits than an unsigned; it may use fewer bits for the sign and value. Overwhelmingly, modern systems will not have these issues; the number of used bits in an int will be the same as the total number of bits used to store it and the number of bits used in an unsigned.
That said, you can count the number of bits in an unsigned object with:
int count = 0;
for (unsigned u = 1; 0 != u; u <<= 1)
++count;
This repeatedly shifts the bit in u left until it is shifted out, while counting the number of iterations required to do this. Note that the bits in an int cannot properly be counted this way, because the behavior of left shift is not defined by the C standard when it overflows an int.
Question one
#define NBITS(type_or_object) (sizeof(type_or_object) * CHAR_BIT)
or without multiplication
#define NBITS(type_or_object) (sizeof(type_or_object) << (CHAR_BIT == 8 ? 3 : CHAR_BIT == 16 ? 4 : CHAR_BIT == 32 ? 5 : 0))
Second question:
For the most popular two's complement (but I think it will also work for sign bit as well as -0 < 0 as I remember). Ir is for signed type. Unsigned types are easy.
int CountIntBits(void)
{
int IntgMax = 1;
int count = 1;
while (IntgMax > 0 )
{
count++;
IntgMax <<= 1;
}
return count;
}
int main(void)
{
printf("%d\n", CountIntBits());
}
or (also no multiplication :) )
int CountIntBits(void)
{
int shift = CHAR_BIT == 8 ? 3 : CHAR_BIT == 16 ? 4 : CHAR_BIT == 32 ? 5 : 0;
return sizeof(int) << shift;
}
for unsigned types:
int CountIntBits(void)
{
unsigned IntgMax = 1;
int count = 0;
while (IntgMax)
{
count++;
IntgMax <<= 1;
}
return count;
}

Function for binary conversion

I am trying to convert a decimal value to binary using the function I wrote in C below. I cannot figure out the reason why it is printing 32 zeroes rather than the binary value of 2.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
int binaryConversion(int num){
int bin_buffer[32];
int mask = INT_MIN;
for(int i = 0; i < 32; i++){
if(num & mask){
bin_buffer[i] = 1;
mask >> 1;
}
else{
bin_buffer[i] = 0;
mask >> 1;
}
}
for(int j = 0; j < 32; j++){
printf("%d", bin_buffer[j]);
}
}
int main(){
binaryConversion(2);
}
Thanks
Two mistakes:
You use >> instead of >>=, so you're not actually ever changing mask.
You didn't declare mask as unsigned, so when you shift, it'll get sign-extended, which you don't want.
If you put a:
printf("%d %d\n", num, mask);
immediately inside your for loop, you'll see why:
2 -2147483648
2 -2147483648
2 -2147483648
2 -2147483648
:
2 -2147483648
The expression mask >> 1 does right shift the value of mask but doesn't actually assign it back to mask. I think you meant to use:
mask >>= 1;
On top of that (once you fix that problem), you'll see that the values in the mask are a bit strange because right-shifting a negative value can preserve the sign, meaning you will end up with multiple bits set.
You'd be better off using unsigned integers since the >> operator will act on them more in line with your expectations.
Additionally, there's little point in writing all those bits into a buffer just so you can print them out later. Unless you need to do some manipulation on the bits (and this appears to not be the case here), you can just output them directly as they're calculated (and get rid of the now unnecessary i variable).
So, taking all those points into account, you can greatly simplify your code such as with the following complete program:
#include <stdio.h>
#include <limits.h>
int binaryConversion(unsigned num) {
for (unsigned mask = (unsigned)INT_MIN; mask != 0; mask >>= 1)
putchar((num & mask) ? '1' : '0');
}
int main(void) {
binaryConversion(2);
putchar('\n');
}
And just one more note, the value of INT_MIN is not actually required to just have the top bit set. Because of the current allowance by C to handle ones' complement and sign-magnitude (as well as two's complement) for negative numbers, it possible for INT_MIN to have a value with multiple bits set (such as -32767).
There are moves afoot to remove these little-used encodings from C (C++20 has already flagged this) but, for maximum portability, you could opt instead for the following function:
int binaryConversion(unsigned int num) {
// Done once to set topBit.
static unsigned topBit = 0;
if (topBit == 0) {
topBit = 1;
while (topBit << 1 != 0) topBit <<= 1;
}
// Loop to process all bits.
for (unsigned mask = topBit; mask != 0; mask >>= 1)
putchar(num & mask ? '1' : '0');
}
This calculates the value with the top bit set the first time you call the function, irrespective of the vagaries of negative encodings. Just watch out if you call it concurrently in a threaded program.
But, as mentioned, this probably isn't necessary, the number of environments that use the other two encodings would be countable on the fingers of a very careless/unlucky industrial machine operator.
You already have your primary question answered regarding the use of >> rather than =>>. However, from a fundamental standpoint there is no need to buffer the 1 and 0 in an array of int (e.g. int bin_buffer[32];) and there is no need to use the variadic printf function to display int values if all you are doing is outputting the binary representation of the number.
Instead, all you need is putchar() to output '1' or '0' depending on whether any bit is set or clear. You can also make your output function a bit more useful by providing the size of the representation you want, e.g. a byte (8-bits), a word (16-bits), and so on.
For example, you could do:
#include <stdio.h>
#include <limits.h>
/** binary representation of 'v' padded to 'sz' bits.
* the padding amount is limited to the number of
* bits in 'v'. valid range: 0 - sizeof v * CHAR_BIT.
*/
void binaryConversion (const unsigned long v, size_t sz)
{
if (!sz) { fprintf (stderr, "error: invalid sz.\n"); return; }
if (!v) { while (sz--) putchar ('0'); return; }
if (sz > sizeof v * CHAR_BIT)
sz = sizeof v * CHAR_BIT;
while (sz--)
putchar ((v >> sz & 1) ? '1' : '0');
}
int main(){
fputs ("byte : ", stdout);
binaryConversion (2, 8);
fputs ("\nword : ", stdout);
binaryConversion (2, 16);
putchar ('\n');
}
Which allows you to set the number of bits you want displayed, e.g.
Example Use/Output
$ ./bin/binaryconversion
byte : 00000010
word : 0000000000000010
There is nothing wrong with your approach, but there may be a simpler way to arrive at the same output.
Let me know if you have further questions.
INT_MIN is a negative number so when you shifted to the right using >>, the most significant bit will still be 1 instead of zero and you will end up in mask=11111...111 all bits have value of 1. Also the mask value is not changing. better use >>= instead. You can try masking on 0x1 and shift the actual value of num instead of the mask like this.
int binaryConversion(int num) {
char bin_buffer[32 + 1]; //+1 for string terminator.
int shifted = num;
for (int i = 31; i >= 0; --i, shifted >>= 1) { //loop 32x
bin_buffer[i] = '0' + (shifted & 0x1);
}
bin_buffer[32] = 0; //terminate the string.
printf("%s", bin_buffer);
}

Fast strlen with bit operations

I found this code
int strlen_my(const char *s)
{
int len = 0;
for(;;)
{
unsigned x = *(unsigned*)s;
if((x & 0xFF) == 0) return len;
if((x & 0xFF00) == 0) return len + 1;
if((x & 0xFF0000) == 0) return len + 2;
if((x & 0xFF000000) == 0) return len + 3;
s += 4, len += 4;
}
}
I'm very interested in knowing how it works. ¿Can anyone explain how it works?
A bitwise AND with ones will retrieve the bit pattern from the other operand. Meaning, 10101 & 11111 = 10101. If the result of that bitwise AND is 0, then we know we know the other operand was 0. A result of 0 when ANDing a single byte with 0xFF (ones) will indicate a NULL byte.
The code itself checks each byte of the char array in four-byte partitions. NOTE: This code isn't portable; on another machine or compiler, an unsigned int could be more than 4 bytes. It would probably be better to use the uint32_t data type to ensure 32-bit unsigned integers.
The first thing to note is that on a little-endian machine, the bytes making up the character array will be read into an unsigned data type in reverse order; that is, if the four bytes at the current address are the bit pattern corresponding to abcd, then the unsigned variable will contain the bit pattern corresponding to dcba.
The second is that a hexadecimal number constant in C results in an int-sized number with the specified bytes at the little-end of the bit pattern. Meaning, 0xFF is actually 0x000000FF when compiling with 4-byte ints. 0xFF00 is 0x0000FF00. And so on.
So the program is basically looking for the NULL character in the four possible positions. If there is no NULL character in the current partition, it advances to the next four-byte slot.
Take the char array abcdef for an example. In C, string constants will always have null terminators at the end, so there's a 0x00 byte at the end of that string.
It'll work as follows:
Read "abcd" into unsigned int x:
x: 0x64636261 [ASCII representations for "dcba"]
Check each byte for a null terminator:
0x64636261
& 0x000000FF
0x00000061 != 0,
0x64636261
& 0x0000FF00
0x00006200 != 0,
And check the other two positions; there are no null terminators in this 4-byte partition, so advance to the next partition.
Read "ef" into unsigned int x:
x: 0xBF006665 [ASCII representations for "fe"]
Note the 0xBF byte; this is past the string's length, so we're reading in garbage from the runtime stack. It could be anything. On a machine that doesn't allow unaligned accesses, this will crash if the memory after the string is not 1-byte aligned. If there were just one character left in the string, we'd be reading two extra bytes, so the alignment of the memory adjacent to the char array would have to be 2-byte aligned.
Check each byte for a null terminator:
0xBF006665
& 0x000000FF
0x00000065 != 0,
0xBF006665
& 0x0000FF00
0x00006600 != 0,
0xBF006665
& 0x00FF0000
0x00000000 == 0 !!!
So we return len + 2; len was 4 since we incremented it once by 4, so we return 6, which is indeed the length of the string.
Code "works" by attempting to read 4 bytes at a time by assuming the string is laid out and accessible like an array of int. Code reads the first int and then each byte in turn, testing if it is the null character. In theory, code working with int will run faster then 4 individualchar operations.
But there are problems:
Alignment is an issue: e.g. *(unsigned*)s may seg-fault.
Endian is an issue with if((x & 0xFF) == 0) might not get the byte at address s
s += 4 is a problem as sizeof(int) may differ from 4.
Array types may exceed int range, better to use size_t.
An attempt to right these difficulties.
#include <stddef.h>
#include <stdio.h>
static inline aligned_as_int(const char *s) {
max_align_t mat; // C11
uintptr_t i = (uintptr_t) s;
return i % sizeof mat == 0;
}
size_t strlen_my(const char *s) {
size_t len = 0;
// align
while (!aligned_as_int(s)) {
if (*s == 0) return len;
s++;
len++;
}
for (;;) {
unsigned x = *(unsigned*) s;
#if UINT_MAX >> CHAR_BIT == UCHAR_MAX
if(!(x & 0xFF) || !(x & 0xFF00)) break;
s += 2, len += 2;
#elif UINT_MAX >> CHAR_BIT*3 == UCHAR_MAX
if (!(x & 0xFF) || !(x & 0xFF00) || !(x & 0xFF0000) || !(x & 0xFF000000)) break;
s += 4, len += 4;
#elif UINT_MAX >> CHAR_BIT*7 == UCHAR_MAX
if ( !(x & 0xFF) || !(x & 0xFF00)
|| !(x & 0xFF0000) || !(x & 0xFF000000)
|| !(x & 0xFF00000000) || !(x & 0xFF0000000000)
|| !(x & 0xFF000000000000) || !(x & 0xFF00000000000000)) break;
s += 8, len += 8;
#else
#error TBD code
#endif
}
while (*s++) {
len++;
}
return len;
}
It trades undefined behaviour (unaligned accesses, 75% probability to access beyond the end of the array) for a very questionable speedup (it is very possibly even slower). And is not standard-compliant, because it returns int instead of size_t. Even if unaligned accesses are allowed on the platform, they can be much slower than aligned accesses.
It also does not work on big-endian systems, or if unsigned is not 32 bits. Not to mention the multiple mask and conditional operations.
That said:
It tests 4 8-bit bytes at a time by loading a unsigned (which is not even guaranteed to have more than 16 bits). Once any of the bytes contains the '\0'-terminator, it returns the sum of the current length plus the position of that byte. Else it increments the current length by the number of bytes tested in parallel (4) and gets the next unsigned.
My advice: bad example of optimization plus too many uncertainties/pitfalls. It's likely not even faster — just profile it against the standard version:
size_t strlen(restrict const char *s)
{
size_t l = 0;
while ( *s++ )
l++;
return l;
}
There might be a way to use special vector-instructions, but unless you can prove this is a critical function, you should leave this to the compiler — some may unroll/speedup such loops much better.
All there proposals are slower than a simple strlen().
The reason is that they do not reduce the number of comparisons and only one deals with alignment.
Check for the strlen() proposal from Torbjorn Granlund (tege#sics.se) and Dan Sahlin (dan#sics.se) in the net. If you are on a 64 bit platform this really helps to speed up things.
It detects if any bits are set at a specific byte on a little-endian machine. Since we're only checking a single byte (since all the nibbles, 0 or 0xF, are doubled up) and it happens to be the last byte position (since the machine is little-endian and the byte pattern for the numbers is therefore reversed) we can immediately know which byte contains NUL.
The loop is taking 4 bytes of the char array for each iteration. The four if statements are used to determine if the string is over, using bitmask with AND operator to read the status of i-th element of the substring selected.

Iterate through bits in C

I have a big char *str where the first 8 chars (which equals 64 bits if I'm not wrong), represents a bitmap. Is there any way to iterate through these 8 chars and see which bits are 0? I'm having alot of trouble understanding the concept of bits, as you can't "see" them in the code, so I can't think of any way to do this.
Imagine you have only one byte, a single char my_char. You can test for individual bits using bitwise operators and bit shifts.
unsigned char my_char = 0xAA;
int what_bit_i_am_testing = 0;
while (what_bit_i_am_testing < 8) {
if (my_char & 0x01) {
printf("bit %d is 1\n", what_bit_i_am_testing);
}
else {
printf("bit %d is 0\n", what_bit_i_am_testing);
}
what_bit_i_am_testing++;
my_char = my_char >> 1;
}
The part that must be new to you, is the >> operator. This operator will "insert a zero on the left and push every bit to the right, and the rightmost will be thrown away".
That was not a very technical description for a right bit shift of 1.
Here is a way to iterate over each of the set bits of an unsigned integer (use unsigned rather than signed integers for well-defined behaviour; unsigned of any width should be fine), one bit at a time.
Define the following macros:
#define LSBIT(X) ((X) & (-(X)))
#define CLEARLSBIT(X) ((X) & ((X) - 1))
Then you can use the following idiom to iterate over the set bits, LSbit first:
unsigned temp_bits;
unsigned one_bit;
temp_bits = some_value;
for ( ; temp_bits; temp_bits = CLEARLSBIT(temp_bits) ) {
one_bit = LSBIT(temp_bits);
/* Do something with one_bit */
}
I'm not sure whether this suits your needs. You said you want to check for 0 bits, rather than 1 bits — maybe you could bitwise-invert the initial value. Also for multi-byte values, you could put it in another for loop to process one byte/word at a time.
It's true for little-endian memory architecture:
const int cBitmapSize = 8;
const int cBitsCount = cBitmapSize * 8;
const unsigned char cBitmap[cBitmapSize] = /* some data */;
for(int n = 0; n < cBitsCount; n++)
{
unsigned char Mask = 1 << (n % 8);
if(cBitmap[n / 8] & Mask)
{
// if n'th bit is 1...
}
}
In the C language, chars are 8-bit wide bytes, and in general in computer science, data is organized around bytes as the fundamental unit.
In some cases, such as your problem, data is stored as boolean values in individual bits, so we need a way to determine whether a particular bit in a particular byte is on or off. There is already an SO solution for this explaining how to do bit manipulations in C.
To check a bit, the usual method is to AND it with the bit you want to check:
int isBitSet = bitmap & (1 << bit_position);
If the variable isBitSet is 0 after this operation, then the bit is not set. Any other value indicates that the bit is on.
For one char b you can simply iterate like this :
for (int i=0; i<8; i++) {
printf("This is the %d-th bit : %d\n",i,(b>>i)&1);
}
You can then iterate through the chars as needed.
What you should understand is that you cannot manipulate directly the bits, you can just use some arithmetic properties of number in base 2 to compute numbers that in some way represents some bits you want to know.
How does it work for example ? In a char there is 8 bits. A char can be see as a number written with 8 bits in base 2. If the number in b is b7b6b5b4b3b2b1b0 (each being a digit) then b>>i is b shifted to the right by i positions (in the left 0's are pushed). So, 10110111 >> 2 is 00101101, then the operation &1 isolate the last bit (bitwise and operator).
If you want to iterate through all char.
char *str = "MNO"; // M=01001101, N=01001110, O=01001111
int bit = 0;
for (int x = strlen(str)-1; x > -1; x--){ // Start from O, N, M
printf("Char %c \n", str[x]);
for(int y=0; y<8; y++){ // Iterate though every bit
// Shift bit the the right with y step and mask last position
if( str[x]>>y & 0b00000001 ){
printf("bit %d = 1\n", bit);
}else{
printf("bit %d = 0\n", bit);
}
bit++;
}
}
output
Char O
bit 0 = 1
bit 1 = 1
bit 2 = 1
bit 3 = 1
bit 4 = 0
bit 5 = 0
bit 6 = 1
bit 7 = 0
Char N
bit 8 = 0
bit 9 = 1
bit 10 = 1
...

How to define and work with an array of bits in C?

I want to create a very large array on which I write '0's and '1's. I'm trying to simulate a physical process called random sequential adsorption, where units of length 2, dimers, are deposited onto an n-dimensional lattice at a random location, without overlapping each other. The process stops when there is no more room left on the lattice for depositing more dimers (lattice is jammed).
Initially I start with a lattice of zeroes, and the dimers are represented by a pair of '1's. As each dimer is deposited, the site on the left of the dimer is blocked, due to the fact that the dimers cannot overlap. So I simulate this process by depositing a triple of '1's on the lattice. I need to repeat the entire simulation a large number of times and then work out the average coverage %.
I've already done this using an array of chars for 1D and 2D lattices. At the moment I'm trying to make the code as efficient as possible, before working on the 3D problem and more complicated generalisations.
This is basically what the code looks like in 1D, simplified:
int main()
{
/* Define lattice */
array = (char*)malloc(N * sizeof(char));
total_c = 0;
/* Carry out RSA multiple times */
for (i = 0; i < 1000; i++)
rand_seq_ads();
/* Calculate average coverage efficiency at jamming */
printf("coverage efficiency = %lf", total_c/1000);
return 0;
}
void rand_seq_ads()
{
/* Initialise array, initial conditions */
memset(a, 0, N * sizeof(char));
available_sites = N;
count = 0;
/* While the lattice still has enough room... */
while(available_sites != 0)
{
/* Generate random site location */
x = rand();
/* Deposit dimer (if site is available) */
if(array[x] == 0)
{
array[x] = 1;
array[x+1] = 1;
count += 1;
available_sites += -2;
}
/* Mark site left of dimer as unavailable (if its empty) */
if(array[x-1] == 0)
{
array[x-1] = 1;
available_sites += -1;
}
}
/* Calculate coverage %, and add to total */
c = count/N
total_c += c;
}
For the actual project I'm doing, it involves not just dimers but trimers, quadrimers, and all sorts of shapes and sizes (for 2D and 3D).
I was hoping that I would be able to work with individual bits instead of bytes, but I've been reading around and as far as I can tell you can only change 1 byte at a time, so either I need to do some complicated indexing or there is a simpler way to do it?
Thanks for your answers
If I am not too late, this page gives awesome explanation with examples.
An array of int can be used to deal with array of bits. Assuming size of int to be 4 bytes, when we talk about an int, we are dealing with 32 bits. Say we have int A[10], means we are working on 10*4*8 = 320 bits and following figure shows it: (each element of array has 4 big blocks, each of which represent a byte and each of the smaller blocks represent a bit)
So, to set the kth bit in array A:
// NOTE: if using "uint8_t A[]" instead of "int A[]" then divide by 8, not 32
void SetBit( int A[], int k )
{
int i = k/32; //gives the corresponding index in the array A
int pos = k%32; //gives the corresponding bit position in A[i]
unsigned int flag = 1; // flag = 0000.....00001
flag = flag << pos; // flag = 0000...010...000 (shifted k positions)
A[i] = A[i] | flag; // Set the bit at the k-th position in A[i]
}
or in the shortened version
void SetBit( int A[], int k )
{
A[k/32] |= 1 << (k%32); // Set the bit at the k-th position in A[i]
}
similarly to clear kth bit:
void ClearBit( int A[], int k )
{
A[k/32] &= ~(1 << (k%32));
}
and to test if the kth bit:
int TestBit( int A[], int k )
{
return ( (A[k/32] & (1 << (k%32) )) != 0 ) ;
}
As said above, these manipulations can be written as macros too:
// Due order of operation wrap 'k' in parentheses in case it
// is passed as an equation, e.g. i + 1, otherwise the first
// part evaluates to "A[i + (1/32)]" not "A[(i + 1)/32]"
#define SetBit(A,k) ( A[(k)/32] |= (1 << ((k)%32)) )
#define ClearBit(A,k) ( A[(k)/32] &= ~(1 << ((k)%32)) )
#define TestBit(A,k) ( A[(k)/32] & (1 << ((k)%32)) )
typedef unsigned long bfield_t[ size_needed/sizeof(long) ];
// long because that's probably what your cpu is best at
// The size_needed should be evenly divisable by sizeof(long) or
// you could (sizeof(long)-1+size_needed)/sizeof(long) to force it to round up
Now, each long in a bfield_t can hold sizeof(long)*8 bits.
You can calculate the index of a needed big by:
bindex = index / (8 * sizeof(long) );
and your bit number by
b = index % (8 * sizeof(long) );
You can then look up the long you need and then mask out the bit you need from it.
result = my_field[bindex] & (1<<b);
or
result = 1 & (my_field[bindex]>>b); // if you prefer them to be in bit0
The first one may be faster on some cpus or may save you shifting back up of you need
to perform operations between the same bit in multiple bit arrays. It also mirrors
the setting and clearing of a bit in the field more closely than the second implemention.
set:
my_field[bindex] |= 1<<b;
clear:
my_field[bindex] &= ~(1<<b);
You should remember that you can use bitwise operations on the longs that hold the fields
and that's the same as the operations on the individual bits.
You'll probably also want to look into the ffs, fls, ffc, and flc functions if available. ffs should always be avaiable in strings.h. It's there just for this purpose -- a string of bits.
Anyway, it is find first set and essentially:
int ffs(int x) {
int c = 0;
while (!(x&1) ) {
c++;
x>>=1;
}
return c; // except that it handles x = 0 differently
}
This is a common operation for processors to have an instruction for and your compiler will probably generate that instruction rather than calling a function like the one I wrote. x86 has an instruction for this, by the way. Oh, and ffsl and ffsll are the same function except take long and long long, respectively.
You can use & (bitwise and) and << (left shift).
For example, (1 << 3) results in "00001000" in binary. So your code could look like:
char eightBits = 0;
//Set the 5th and 6th bits from the right to 1
eightBits &= (1 << 4);
eightBits &= (1 << 5);
//eightBits now looks like "00110000".
Then just scale it up with an array of chars and figure out the appropriate byte to modify first.
For more efficiency, you could define a list of bitfields in advance and put them in an array:
#define BIT8 0x01
#define BIT7 0x02
#define BIT6 0x04
#define BIT5 0x08
#define BIT4 0x10
#define BIT3 0x20
#define BIT2 0x40
#define BIT1 0x80
char bits[8] = {BIT1, BIT2, BIT3, BIT4, BIT5, BIT6, BIT7, BIT8};
Then you avoid the overhead of the bit shifting and you can index your bits, turning the previous code into:
eightBits &= (bits[3] & bits[4]);
Alternatively, if you can use C++, you could just use an std::vector<bool> which is internally defined as a vector of bits, complete with direct indexing.
bitarray.h:
#include <inttypes.h> // defines uint32_t
//typedef unsigned int bitarray_t; // if you know that int is 32 bits
typedef uint32_t bitarray_t;
#define RESERVE_BITS(n) (((n)+0x1f)>>5)
#define DW_INDEX(x) ((x)>>5)
#define BIT_INDEX(x) ((x)&0x1f)
#define getbit(array,index) (((array)[DW_INDEX(index)]>>BIT_INDEX(index))&1)
#define putbit(array, index, bit) \
((bit)&1 ? ((array)[DW_INDEX(index)] |= 1<<BIT_INDEX(index)) \
: ((array)[DW_INDEX(index)] &= ~(1<<BIT_INDEX(index))) \
, 0 \
)
Use:
bitarray_t arr[RESERVE_BITS(130)] = {0, 0x12345678,0xabcdef0,0xffff0000,0};
int i = getbit(arr,5);
putbit(arr,6,1);
int x=2; // the least significant bit is 0
putbit(arr,6,x); // sets bit 6 to 0 because 2&1 is 0
putbit(arr,6,!!x); // sets bit 6 to 1 because !!2 is 1
EDIT the docs:
"dword" = "double word" = 32-bit value (unsigned, but that's not really important)
RESERVE_BITS: number_of_bits --> number_of_dwords
RESERVE_BITS(n) is the number of 32-bit integers enough to store n bits
DW_INDEX: bit_index_in_array --> dword_index_in_array
DW_INDEX(i) is the index of dword where the i-th bit is stored.
Both bit and dword indexes start from 0.
BIT_INDEX: bit_index_in_array --> bit_index_in_dword
If i is the number of some bit in the array, BIT_INDEX(i) is the number
of that bit in the dword where the bit is stored.
And the dword is known via DW_INDEX().
getbit: bit_array, bit_index_in_array --> bit_value
putbit: bit_array, bit_index_in_array, bit_value --> 0
getbit(array,i) fetches the dword containing the bit i and shifts the dword right, so that the bit i becomes the least significant bit. Then, a bitwise and with 1 clears all other bits.
putbit(array, i, v) first of all checks the least significant bit of v; if it is 0, we have to clear the bit, and if it is 1, we have to set it.
To set the bit, we do a bitwise or of the dword that contains the bit and the value of 1 shifted left by bit_index_in_dword: that bit is set, and other bits do not change.
To clear the bit, we do a bitwise and of the dword that contains the bit and the bitwise complement of 1 shifted left by bit_index_in_dword: that value has all bits set to one except the only zero bit in the position that we want to clear.
The macro ends with , 0 because otherwise it would return the value of dword where the bit i is stored, and that value is not meaningful. One could also use ((void)0).
It's a trade-off:
(1) use 1 byte for each 2 bit value - simple, fast, but uses 4x memory
(2) pack bits into bytes - more complex, some performance overhead, uses minimum memory
If you have enough memory available then go for (1), otherwise consider (2).

Resources