sprintf - producing char array from an int in C - c

I'm doing an assignment for school to swap the bytes in an unsigned long, and return the swapped unsigned long. ex. 0x12345678 -> 0x34127856.
I figured I'll make a char array, use sprintf to insert the long into a char array, and then do the swapping, stepping through the array. I'm pretty familiar with c++, but C seems a little more low level. I researched a few topics on sprintf, and I tried to make an array, but I'm not sure why it's not working.
unsigned long swap_bytes(unsigned long n) {
char new[64];
sprintf(new, "%l", n);
printf("Char array is now: %s\n", new);
}

TLDR; The correct approach is at the bottom
Preamble
Issues with what you're doing
First off using sprintf for byte swapping is the wrong approach because
it is a MUCH MUCH slower process than using the mathematical properties of bit operations to perform the byte swapping.
A byte is not a digit in a number. (a wrong assumption that you've made in your approach)
It's even more painful when you don't know the size of your integer (is it 32-bits, 64 bits or what)
The correct approach
Use bit manipulation to swap the bytes (see way way below)
The absolutely incorrect implementation with wrong output (because we're ignoring issue #2 above)
There are many technical reasons why sprintf is much slower but suffice it to say that it's so because moving contents of memory around is a slow operation, and of course more data you're moving around the slower it gets:
In your case, by changing a number (which sits in one manipulatable 'word' (think of it as a cell)) into its human readable string-equivalence you are doing two things:
You are converting (let's assume a 64-bit CPU) a single number represented by 8 bytes in a single CPU cell (officially a register) into a human equivalence string and putting it in RAM (memory). Now, each character in the string now takes up at least a byte: So a 16 digit number takes up 16 bytes (rather than 8)
You are then moving these characters around using memory operations (which are slow compared do doing something directly on CPU, by factor of a 1000)
Then you're converting the characters back to integers, which is a long and tedious operation
However, since that's the solution that you came up with let's first look at it.
The really wrong code with a really wrong answer
Starting (somewhat) with your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned long swap_bytes(unsigned long n) {
int i, l;
char new[64]; /* the fact that you used 64 here told me that you made assumption 2 */
sprintf(new, "%lu", n); /* you forgot the `u` here */
printf("The number is: %s\n", new); /* well it shows up :) */
l = strlen(new);
for(i = 0; i < l; i+=4) {
char tmp[2];
tmp[0] = new[i+2]; /* get next two characters */
tmp[1] = new[i+3];
new[i+2] = new[i];
new[i+3] = new[i+1];
new[i] = tmp[0];
new[i+1] = tmp[1];
}
return strtoul(new, NULL, 10); /* convert new back */
}
/* testing swap byte */
int main() {
/* seems to work: */
printf("Swapping 12345678: %lu\n", swap_bytes(12345678));
/* how about 432? (err not) */
printf("Swapping 432: %lu\n", swap_bytes(432));
}
As you can see the above is not really byte swapping but character swapping. And any attempt to try and "fix" the above code is nonsensical. For example,how do we deal with odd number of digits?
Well, I suppose we can pad odd digit counts with a zero:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned long swap_bytes(unsigned long n) {
int i, l;
char new[64]; /* the fact that you used 64 here told me that you made assumption 2 */
sprintf(new, "%lu", n); /* you forgot the `u` here */
printf("The number is: %s\n", new); /* well it shows up :) */
l = strlen(new);
if(l % 2 == 1) { /* check if l is odd */
printf("adding a pad to make n even digit count");
sprintf(new, "0%lu", n);
l++; /* length has increased */
}
for(i = 0; i < l; i+=4) {
char tmp[2];
tmp[0] = new[i+2]; /* get next two characters */
tmp[1] = new[i+3];
new[i+2] = new[i];
new[i+3] = new[i+1];
new[i] = tmp[0];
new[i+1] = tmp[1];
}
return strtoul(new, NULL, 10); /* convert new back */
}
/* testing swap byte */
int main() {
/* seems to work: */
printf("Swapping 12345678: %lu\n", swap_bytes(12345678));
printf("Swapping 432: %lu\n", swap_bytes(432));
/* how about 432516? (err not) */
printf("Swapping 432: %lu\n", swap_bytes(432));
}
Now we run into an issue with numbers which are not divisible by 4... Do we pad them with zeros on the right or the left or the middle? err NOT REALLY.
In any event this entire approach is wrong because we're not swapping bytes anyhow, we're swapping characters.
Now what?
So you may be asking
what the heck is my assignment talking about?
Well numbers are represented as bytes in memory, and what the assignment is asking for is for you to get that representation and swap it.
So for example, if we took a number like 12345678 it's actually stored as some sequence of bytes (1 byte == 8 bits). So let's look at the normal math way of representing 12345678 (base 10) in bits (base 2) and bytes (base 8):
(12345678)10 = (101111000110000101001110)2
Splitting the binary bits into groups of 4 for visual ease gives:
(12345678)10 = (1011 1100 0110 0001 0100 1110)2
But 4 bits are equal to 1 hex number (0, 1, 2, 3... 9, A, B...F), so we can convert the bits into nibbles (4-bit hex numbers) easily:
(12345678)10 = 1011 | 1100 | 0110 | 0001 | 0100 | 1110
(12345678)10 = B | C | 6 | 1 | 4 | E
But each byte (8-bits) is two nibbles (4-bits) so if we squish this a bit:
(12345678)10 = (BC 61 4E)16
So 12345678 is actually representable in 3 bytes;
However CPUs have specific sizes for integers, usually these are multiples of 2 and divisible by 4. This is so because of a variety of reasons that are beyond the scope of this discussion, suffice it to say that you will get things like 16-bit, 32-bit, 64-bit, 128-bit etc... And most often the CPU of a particular bit-size (say a 64bit CPU) will be able to manipulate unsigned integers representable in that bit-size directly without having to store parts of the number in RAM.
Slight Digression
So let's say we have a 32-bit CPU, and somewhere at byte number α in RAM. The CPU could store the number 12345678 as:
> 00 BC 61 4E
> ↑ α ↑ α+1 ↑ α+2 ↑ α+3
(Figure 1)
Here the most significant part of the number, is sitting at the lowest memory address index α
Or the CPU could store it differently, where the least significant part of the number is sitting at the lowest memory.
> 4E 61 BC 00
> ↑ α ↑ α+1 ↑ α+2 ↑ α+3
(Figure 2)
The way a CPU stores a number is called Endianness (of the CPU). Where, if the most significant part is on the left then it's called Big-Endian CPU (Figure 1), or Little-Endian if it stores it as in (Figure 2)
Getting the correct answer (the wrong way)
Now that we have an idea of how things may be stored, let's try and pull this out still using sprintf.
We're going to use a couple of tricks here:
we'll convert the numbers to hexadecimal and then pad the number to 8 bytes
we'll use printf's (therefore sprintf) format string capability that if we want to use a variable to specify the width of an argument then we can use a * after the % sign like so:
printf("%*d", width, num);
If we set our format string to %0*x we get a hex number that's zero padded in output automatically, so:
sprintf(new, "%0*llx", sizeof(n), n);
Our program then becomes:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned long swap_bytes(unsigned long n) {
int i, l;
char new[64] = "";
sprintf(new, "%0*llx", sizeof(n), n);
printf("The number is: %s\n", new);
l = strlen(new);
for(i = 0; i < l; i+=4) {
char tmp[2];
tmp[0] = new[i+2]; /* get next two characters */
tmp[1] = new[i+3];
new[i+2] = new[i];
new[i+3] = new[i+1];
new[i] = tmp[0];
new[i+1] = tmp[1];
}
return strtoul(new, NULL, 16); /* convert new back */
}
/* testing swap byte */
int main() {
printf("size of unsigned long is %ld\n", sizeof(unsigned long));
printf("Swapping 12345678: %llx\n", swap_bytes(12345678));
/* how about 123456? */
printf("Swapping 123456: %llx\n", swap_bytes(123456));
printf("Swapping 123456: %llx\n", swap_bytes(98899));
}
The output would look something like:
size of unsigned long is 8
The number is: 00bc614e
Swapping 12345678: bc004e61
The number is: 0001e240
Swapping 123456: 10040e2
The number is: 00018253
Swapping 123456: 1005382
Obviously we can change our outputs by using %ld and print the base 10 versions of the numbers, rather than base 16 as is happening above. I'll leave that to you.
Now let's do it the right way
This is however rather terrible, since byte swapping can be done much faster without ever doing the integer to string and string to integer conversion.
Let's see how that's done:
The rather explicit way
Before we go on, just a bit on bit shifting in C:
If I have a number, say 6 (=1102) and I shift all the bits to the left by 1 I would get 12 (11002) (we simply shifted everything to the left adding zeros on the right as needed)
This is written in C as 6 << 1.
A right shift is similar and can be expressed in C with >> so if I have a number say 240 = (11110000)2 and I right-shift it 4 times I would get 15 = (1111)2 this is expressed as 240 >> 3
Now we have unsigned long integers which are (in my case at least) 64 bits long, or 8 bytes long.
Let's say my number is 12345678 which is (00 00 00 00 00 bc 61 4e)16 in hex at 8 bytes long. If I want to get the value of byte number 3 I can extract it by taking the number 0xFF (1111 1111) all bits of a byte set to 1 and left shifting it until i get to the byte 3 (so left shift 3*8 = 24 times) performing a bitwise and with the number and then right shifting the results to get rid of the zeros. This is what it looks like:
0xFF << (3 * 8) = 0xFF0000 & 0000 0000 00bc 614e = 0000 0000 00bc 0000
Now right shift:
0xFF0000 & 0000 0000 00bc 0000 >> (3 * 8) = bc
Another (better) way to do it would be to right shift first and then perform bitwise and with 0xFF to drop all higher bits:
0000 0000 00bc 614e >> 24 = 0000 0000 0000 00bc & 0xFF = bc
We will use the second way, and make a macro using #define now we can add the bytes back at the right location by right shifting each kth byte k+1 times and each k+1st byte k times.
Here is a sample implementation of this:
#define GET_BYTE(N, B) ((N >> (8 * (B))) & 0xFFUL)
unsigned long swap_bytes(unsigned long n)
{
unsigned long long rv = 0ULL;
int k;
printf("number is %016llx\n", n);
for(k =0 ; k < sizeof(n); k+=2) {
printf("swapping bytes %d[%016lx] and %d[%016lx]\n", k, GET_BYTE(n, k),
k+1, GET_BYTE(n, k+1));
rv += GET_BYTE(n, k) << 8*(k+1);
rv += GET_BYTE(n, k+1) << 8*k;
}
return rv;
}
/* testing swap byte */
int main() {
printf("size of unsigned long is: %ld\n", sizeof(unsigned long));
printf("Swapping 12345678: %llx\n", swap_bytes(12345678));
/* how about 123456? */
printf("Swapping 123456: %llx\n", swap_bytes(123456));
printf("Swapping 123456: %llx\n", swap_bytes(98899));
}
But this can be done so much more efficiently. I leave it here for now. We'll come back to using bit blitting and xor swapping later.
Update with GET_BYTE as a function instead of a macro:
#define GET_BYTE(N, B) ((N >> (8 * (B))) & 0xFFUL)
Just for fun we also use a shift operator for multiplying by 8. You can note that left shifting a number by 1 is like multiplying it by 2 (makes sense since in binary 2 is 10 and multiplying by 10 adds a zero to the end and therefore is the same as shifting something left by one space) So multiplying by 8 (1000)2 is like shifting something three spaces over or basically tacking on 3 zeros (overflows notwithstanding):
unsigned long __inline__ get_byte(const unsigned long n, const unsigned char idx) {
return ((n >> (idx << 3)) & 0xFFUL);
}
Now the really really fun and correct way to do this
Okay so a fast way to swap integers around is to realize that if we have two integers x, and y we can use properties of xor function to swap their values. The basic algorithm is this:
X := X XOR Y
Y := Y XOR X
X := X XOR Y
Now we know that a char is one byte in C. So we can force the compiler to treat the 8 byte integer as a sequence of 1-byte chars (hehe it's a bit of a mind bender considering everything I said about not doing it in sprintf) but this is different. You have to just think about it a bit.
We'll take the memory address of our integer, cast it to a char pointer (char *) and treat the result as an array of chars. Then we'll use the xor function property above to swap the two consecutive array values.
To do this I am going to use a macro (although we could use a function) but using a function will make the code uglier.
One thing you'll note is that there is the use of ?: in XORSWAP below. That's like an if-then-else in C but with expressions rather than statements, so basically (conditional_expression) ? (value_if_true) : (value_if_false) means if conditional_expression is non-zero the result will be value_if_true, otherwise it will be value_if_false. AND it's important not to xor a value with itself because you will always get 0 as a result and clobber the content. So we use the conditional to check if the addresses of the values we are changing are DIFFERENT from each other. If the addresses are the same (&a == &b) we simply return the value at the address (&a == &b) ? a : (otherwise_do_xor)
So let's do it:
#include <stdio.h>
/* this macro swaps any two non floating C values that are at
* DIFFERENT memory addresses. That's the entire &a == &b ? a : ... business
*/
#define XORSWAP(a, b) ((&(a) == &(b)) ? (a) : ((a)^=(b),(b)^=(a),(a)^=(b)))
unsigned long swap_bytes(const unsigned long n) {
unsigned long rv = n; /* we are not messing with original value */
int k;
for(k = 0; k < sizeof(rv); k+=2) {
/* swap k'th byte with k+1st byte */
XORSWAP(((char *)&rv)[k], ((char *)&rv)[k+1]);
}
return rv;
}
int main()
{
printf("swapped: %lx", swap_bytes(12345678));
return 0;
}
Here endeth the lesson. I hope that you will go through all the examples. If you have any more questions just ask in comments and I'll try to elaborate.

unsigned long swap_bytes(unsigned long n) {
char new[64];
sprintf(new, "%lu", n);
printf("Char array is now: %s\n", new);
}
You need to use %lu - long unsigned, for format in sprintf(), the compiler should also given you conversion lacks type warning because of this.

To get it to print you need to use %lu (for unsigned)
It doesn't seem like you attempted the swap, could I see your try?

Related

Get bits from number string

If I have a number string (char array), one digit is one char, resulting in that the space for a four digit number is 5 bytes, including the null termination.
unsigned char num[] ="1024";
printf("%d", sizeof(num)); // 5
However, 1024 can be written as
unsigned char binaryNum[2];
binaryNum[0] = 0b00000100;
binaryNum[1] = 0b00000000;
How can the conversion from string to binary be made effectively?
In my program i would work with ≈30 digit numbers, so the space gain would be big.
My goal is to create datapackets to be sent over UDP/TCP.
I would prefer not to use libraries for this task, since the available space the code can take up is small.
EDIT:
Thanks for quick response.
char num = 0b0000 0100 // "4"
--------------------------
char num = 0b0001 1000 // "24"
-----------------------------
char num[2];
num[0] = 0b00000100;
num[1] = 0b00000000;
// num now contains 1024
I would need ≈ 10 bytes to contain my number in binary form. So, if I as suggested parse the digits one by one, starting from the back, how would that build up to the final big binary number?
In general, converting a number in string representation to decimal is easy because each character can be parsed separately. E.g. to convert "1024" to 1024 you can just look at the '4', convert it to 4, multiply by 10, then convert the 2 and add it, multiply by 10, and so on until you have parsed the whole string.
For binary it is not so easy, e.g. you can convert 4 to 100 and 2 to 010 but 42 is not 100 010 or 110 or something like that. So, your best bet is to convert the whole thing to a number and then convert that number to binary using mathematical operations (bit shifts and such). This will work fine for numbers that fit in one of the C++ number types, but if you want to handle arbitrarily large numbers you will need a BigInteger class which seems to be a problem for you since the code has to be small.
From your question I gather that you want to compress the string representation in order to transmit the number over a network, so I am offering a solution that does not strictly convert to binary but will still use fewer bytes than the string representation and is easy to use. It is based on the fact that you can store a number 0..9 in 4 bits, and so you can fit two of those numbers in a byte. Hence you can store an n-digit number in n/2 bytes. The algorithm could be as follows:
Take the last character, '4'
Subtract '0' to get 4 (i.e. an int with value 4).
Strip the last character.
Repeat to get 0
Concatenate into a single byte: digits[0] = (4 << 4) + 0.
Do the same for the next two numbers: digits[1] = (2 << 4) + 1.
Your representation in memory will now look like
4 0 2 1
0100 0000 0010 0001
digits[0] digits[1]
i.e.
digits = { 64, 33 }
This is not quite the binary representation of 1024, but it is shorter and it allows you to easily recover the original number by reversing the algorithm.
You even have 5 values left that you don't use for storing digits (i.e. everything larger than 1010) which you can use for other things like storing the sign, decimal point, byte order or end-of-number delimiter).
I trust that you will be able to implement this, should you choose to use it.
If I understand your question correctly, you would want to do this:
Convert your string representation into an integer.
Convert the integer into binary representation.
For step 1:
You could loop through the string
Subtract '0' from the char
Multiply by 10^n (depending on the position) and add to a sum.
For step 2 (for int x), in general:
x%2 gives you the least-significant-bit (LSB).
x /= 2 "removes" the LSB.
For example, take x = 6.
x%2 = 0 (LSB), x /= 2 -> x becomes 3
x%2 = 1, x /= 2 -> x becomes 1
x%2 = 1 (MSB), x /= 2 -> x becomes 0.
So we we see that (6)decimal == (110)bin.
On to the implementation (for N=2, where N is maximum number of bytes):
int x = 1024;
int n=-1, p=0, p_=0, i=0, ex=1; //you can use smaller types of int for this if you are strict on memory usage
unsigned char num[N] = {0};
for (p=0; p<(N*8); p++,p_++) {
if (p%8 == 0) { n++; p_=0; } //for every 8bits, 1) store the new result in the next element in the array. 2) reset the placing (start at 2^0 again).
for (i=0; i<p_; i++) ex *= 2; //ex = pow(2,p_); without using math.h library
num[n] += ex * (x%2); //add (2^p_ x LSB) to num[n]
x /= 2; // "remove" the last bit to check for the next.
ex = 1; // reset the exponent
}
We can check the result for x = 1024:
for (i=0; i<N; i++)
printf("num[%d] = %d\n", i, num[i]); //num[0] = 0 (0b00000000), num[1] = 4 (0b00000100)
To convert a up-to 30 digit decimal number, represented as a string, into a serious of bytes, effectively a base-256 representation, takes up to 13 bytes. (ceiling of 30/log10(256))
Simple algorithm
dest = 0
for each digit of the string (starting with most significant)
dest *= 10
dest += digit
As C code
#define STR_DEC_TO_BIN_N 13
unsigned char *str_dec_to_bin(unsigned char dest[STR_DEC_TO_BIN_N], const char *src) {
// dest[] = 0
memset(dest, 0, STR_DEC_TO_BIN_N);
// for each digit ...
while (isdigit((unsigned char) *src)) {
// dest[] = 10*dest[] + *src
// with dest[0] as the most significant digit
int sum = *src - '0';
for (int i = STR_DEC_TO_BIN_N - 1; i >= 0; i--) {
sum += dest[i]*10;
dest[i] = sum % 256;
sum /= 256;
}
// If sum is non-zero, it means dest[] overflowed
if (sum) {
return NULL;
}
}
// If stopped on something other than the null character ....
if (*src) {
return NULL;
}
return dest;
}

C - Method for setting all even-numbered bits to 1

I was charged with the task of writing a method that "returns the word with all even-numbered bits set to 1." Being completely new to C this seems really confusing and unclear. I don't understand how I can change the bits of a number with C. That seems like a very low level instruction, and I don't even know how I would do that in Java (my first language)! Can someone please help me! This is the method signature.
int evenBits(void){
return 0;
}
Any instruction on how to do this or even guidance on how to begin doing this would be greatly appreciated. Thank you so much!
Break it down into two problems.
(1) Given a variable, how do I set particular bits?
Hint: use a bitwise operator.
(2) How do I find out the representation of "all even-numbered bits" so I can use a bitwise operator to set them?
Hint: Use math. ;-) You could make a table (or find one) such as:
Decimal | Binary
--------+-------
0 | 0
1 | 1
2 | 10
3 | 11
... | ...
Once you know what operation to use to set particular bits, and you know a decimal (or hexadecimal) integer literal to use that with in C, you've solved the problem.
You must give a precise definition of all even numbered bits. Bits are numbered in different ways on different architectures. Hardware people like to number them from 1 to 32 from the least significant to the most significant bit, or sometimes the other way, from the most significant to the least significant bit... while software guys like to number bits by increasing order starting at 0 because bit 0 represents the number 20, ie: 1.
With this latter numbering system, the bit pattern would be 0101...0101, thus a value in hex 0x555...555. If you number bits starting at 1 for the least significant bit, the pattern would be 1010...1010, in hex 0xAAA...AAA. But this representation actually encodes a negative value on current architectures.
I shall assume for the rest of this answer that even numbered bits are those representing even powers of 2: 1 (20), 4 (22), 16 (24)...
The short answer for this problem is:
int evenBits(void) {
return 0x55555555;
}
But what if int has 64 bits?
int evenBits(void) {
return 0x5555555555555555;
}
Would handle 64 bit int but would have implementation defined behavior on systems where int is smaller.
Using macros from <limits.h>, you could mask off the extra bits to handle 16, 32 and 64 bit ints:
#include <limits.h>
int evenBits(void) {
return 0x5555555555555555 & INT_MAX;
}
But this code still makes some assumptions:
int has at most 64 bits.
int has an even number of bits.
INT_MAX is a power of 2 minus 1.
These assumptions are valid for most current systems, but the C Standard allows for implementations where one or more are invalid.
So basically every other bit has to be set to one? This is why we have bitwise operations in C. Imagine a regular bitarray. What you want is the right most even bit and set it to 1(this is the number 2). Then we just use the OR operator (|) to modify our existing number. After doing that. we bitshift the number 2 places to the left (<< 2), this modifies the bit array to 1000 compared to the previous 0010. Then we do the same again and use the or operator. The code below describes it better.
#include <stdio.h>
unsigned char SetAllEvenBitsToOne(unsigned char x);
int IsAllEvenBitsOne(unsigned char x);
int main()
{
unsigned char x = 0; //char is one byte data type ie. 8 bits.
x = SetAllEvenBitsToOne(x);
int check = IsAllEvenBitsOne(x);
if(check==1)
{
printf("shit works");
}
return 0;
}
unsigned char SetAllEvenBitsToOne(unsigned char x)
{
int i=0;
unsigned char y = 2;
for(i=0; i < sizeof(char)*8/2; i++)
{
x = x | y;
y = y << 2;
}
return x;
}
int IsAllEvenBitsOne(unsigned char x)
{
unsigned char y;
for(int i=0; i<(sizeof(char)*8/2); i++)
{
y = x >> 7;
if(y > 0)
{
printf("x before: %d\t", x);
x = x << 2;
printf("x after: %d\n", x);
continue;
}
else
{
printf("Not all even bits are 1\n");
return 0;
}
}
printf("All even bits are 1\n");
return 1;
}
Here is a link to Bitwise Operations in C

Iterate through bits in C

I have a big char *str where the first 8 chars (which equals 64 bits if I'm not wrong), represents a bitmap. Is there any way to iterate through these 8 chars and see which bits are 0? I'm having alot of trouble understanding the concept of bits, as you can't "see" them in the code, so I can't think of any way to do this.
Imagine you have only one byte, a single char my_char. You can test for individual bits using bitwise operators and bit shifts.
unsigned char my_char = 0xAA;
int what_bit_i_am_testing = 0;
while (what_bit_i_am_testing < 8) {
if (my_char & 0x01) {
printf("bit %d is 1\n", what_bit_i_am_testing);
}
else {
printf("bit %d is 0\n", what_bit_i_am_testing);
}
what_bit_i_am_testing++;
my_char = my_char >> 1;
}
The part that must be new to you, is the >> operator. This operator will "insert a zero on the left and push every bit to the right, and the rightmost will be thrown away".
That was not a very technical description for a right bit shift of 1.
Here is a way to iterate over each of the set bits of an unsigned integer (use unsigned rather than signed integers for well-defined behaviour; unsigned of any width should be fine), one bit at a time.
Define the following macros:
#define LSBIT(X) ((X) & (-(X)))
#define CLEARLSBIT(X) ((X) & ((X) - 1))
Then you can use the following idiom to iterate over the set bits, LSbit first:
unsigned temp_bits;
unsigned one_bit;
temp_bits = some_value;
for ( ; temp_bits; temp_bits = CLEARLSBIT(temp_bits) ) {
one_bit = LSBIT(temp_bits);
/* Do something with one_bit */
}
I'm not sure whether this suits your needs. You said you want to check for 0 bits, rather than 1 bits — maybe you could bitwise-invert the initial value. Also for multi-byte values, you could put it in another for loop to process one byte/word at a time.
It's true for little-endian memory architecture:
const int cBitmapSize = 8;
const int cBitsCount = cBitmapSize * 8;
const unsigned char cBitmap[cBitmapSize] = /* some data */;
for(int n = 0; n < cBitsCount; n++)
{
unsigned char Mask = 1 << (n % 8);
if(cBitmap[n / 8] & Mask)
{
// if n'th bit is 1...
}
}
In the C language, chars are 8-bit wide bytes, and in general in computer science, data is organized around bytes as the fundamental unit.
In some cases, such as your problem, data is stored as boolean values in individual bits, so we need a way to determine whether a particular bit in a particular byte is on or off. There is already an SO solution for this explaining how to do bit manipulations in C.
To check a bit, the usual method is to AND it with the bit you want to check:
int isBitSet = bitmap & (1 << bit_position);
If the variable isBitSet is 0 after this operation, then the bit is not set. Any other value indicates that the bit is on.
For one char b you can simply iterate like this :
for (int i=0; i<8; i++) {
printf("This is the %d-th bit : %d\n",i,(b>>i)&1);
}
You can then iterate through the chars as needed.
What you should understand is that you cannot manipulate directly the bits, you can just use some arithmetic properties of number in base 2 to compute numbers that in some way represents some bits you want to know.
How does it work for example ? In a char there is 8 bits. A char can be see as a number written with 8 bits in base 2. If the number in b is b7b6b5b4b3b2b1b0 (each being a digit) then b>>i is b shifted to the right by i positions (in the left 0's are pushed). So, 10110111 >> 2 is 00101101, then the operation &1 isolate the last bit (bitwise and operator).
If you want to iterate through all char.
char *str = "MNO"; // M=01001101, N=01001110, O=01001111
int bit = 0;
for (int x = strlen(str)-1; x > -1; x--){ // Start from O, N, M
printf("Char %c \n", str[x]);
for(int y=0; y<8; y++){ // Iterate though every bit
// Shift bit the the right with y step and mask last position
if( str[x]>>y & 0b00000001 ){
printf("bit %d = 1\n", bit);
}else{
printf("bit %d = 0\n", bit);
}
bit++;
}
}
output
Char O
bit 0 = 1
bit 1 = 1
bit 2 = 1
bit 3 = 1
bit 4 = 0
bit 5 = 0
bit 6 = 1
bit 7 = 0
Char N
bit 8 = 0
bit 9 = 1
bit 10 = 1
...

How to define and work with an array of bits in C?

I want to create a very large array on which I write '0's and '1's. I'm trying to simulate a physical process called random sequential adsorption, where units of length 2, dimers, are deposited onto an n-dimensional lattice at a random location, without overlapping each other. The process stops when there is no more room left on the lattice for depositing more dimers (lattice is jammed).
Initially I start with a lattice of zeroes, and the dimers are represented by a pair of '1's. As each dimer is deposited, the site on the left of the dimer is blocked, due to the fact that the dimers cannot overlap. So I simulate this process by depositing a triple of '1's on the lattice. I need to repeat the entire simulation a large number of times and then work out the average coverage %.
I've already done this using an array of chars for 1D and 2D lattices. At the moment I'm trying to make the code as efficient as possible, before working on the 3D problem and more complicated generalisations.
This is basically what the code looks like in 1D, simplified:
int main()
{
/* Define lattice */
array = (char*)malloc(N * sizeof(char));
total_c = 0;
/* Carry out RSA multiple times */
for (i = 0; i < 1000; i++)
rand_seq_ads();
/* Calculate average coverage efficiency at jamming */
printf("coverage efficiency = %lf", total_c/1000);
return 0;
}
void rand_seq_ads()
{
/* Initialise array, initial conditions */
memset(a, 0, N * sizeof(char));
available_sites = N;
count = 0;
/* While the lattice still has enough room... */
while(available_sites != 0)
{
/* Generate random site location */
x = rand();
/* Deposit dimer (if site is available) */
if(array[x] == 0)
{
array[x] = 1;
array[x+1] = 1;
count += 1;
available_sites += -2;
}
/* Mark site left of dimer as unavailable (if its empty) */
if(array[x-1] == 0)
{
array[x-1] = 1;
available_sites += -1;
}
}
/* Calculate coverage %, and add to total */
c = count/N
total_c += c;
}
For the actual project I'm doing, it involves not just dimers but trimers, quadrimers, and all sorts of shapes and sizes (for 2D and 3D).
I was hoping that I would be able to work with individual bits instead of bytes, but I've been reading around and as far as I can tell you can only change 1 byte at a time, so either I need to do some complicated indexing or there is a simpler way to do it?
Thanks for your answers
If I am not too late, this page gives awesome explanation with examples.
An array of int can be used to deal with array of bits. Assuming size of int to be 4 bytes, when we talk about an int, we are dealing with 32 bits. Say we have int A[10], means we are working on 10*4*8 = 320 bits and following figure shows it: (each element of array has 4 big blocks, each of which represent a byte and each of the smaller blocks represent a bit)
So, to set the kth bit in array A:
// NOTE: if using "uint8_t A[]" instead of "int A[]" then divide by 8, not 32
void SetBit( int A[], int k )
{
int i = k/32; //gives the corresponding index in the array A
int pos = k%32; //gives the corresponding bit position in A[i]
unsigned int flag = 1; // flag = 0000.....00001
flag = flag << pos; // flag = 0000...010...000 (shifted k positions)
A[i] = A[i] | flag; // Set the bit at the k-th position in A[i]
}
or in the shortened version
void SetBit( int A[], int k )
{
A[k/32] |= 1 << (k%32); // Set the bit at the k-th position in A[i]
}
similarly to clear kth bit:
void ClearBit( int A[], int k )
{
A[k/32] &= ~(1 << (k%32));
}
and to test if the kth bit:
int TestBit( int A[], int k )
{
return ( (A[k/32] & (1 << (k%32) )) != 0 ) ;
}
As said above, these manipulations can be written as macros too:
// Due order of operation wrap 'k' in parentheses in case it
// is passed as an equation, e.g. i + 1, otherwise the first
// part evaluates to "A[i + (1/32)]" not "A[(i + 1)/32]"
#define SetBit(A,k) ( A[(k)/32] |= (1 << ((k)%32)) )
#define ClearBit(A,k) ( A[(k)/32] &= ~(1 << ((k)%32)) )
#define TestBit(A,k) ( A[(k)/32] & (1 << ((k)%32)) )
typedef unsigned long bfield_t[ size_needed/sizeof(long) ];
// long because that's probably what your cpu is best at
// The size_needed should be evenly divisable by sizeof(long) or
// you could (sizeof(long)-1+size_needed)/sizeof(long) to force it to round up
Now, each long in a bfield_t can hold sizeof(long)*8 bits.
You can calculate the index of a needed big by:
bindex = index / (8 * sizeof(long) );
and your bit number by
b = index % (8 * sizeof(long) );
You can then look up the long you need and then mask out the bit you need from it.
result = my_field[bindex] & (1<<b);
or
result = 1 & (my_field[bindex]>>b); // if you prefer them to be in bit0
The first one may be faster on some cpus or may save you shifting back up of you need
to perform operations between the same bit in multiple bit arrays. It also mirrors
the setting and clearing of a bit in the field more closely than the second implemention.
set:
my_field[bindex] |= 1<<b;
clear:
my_field[bindex] &= ~(1<<b);
You should remember that you can use bitwise operations on the longs that hold the fields
and that's the same as the operations on the individual bits.
You'll probably also want to look into the ffs, fls, ffc, and flc functions if available. ffs should always be avaiable in strings.h. It's there just for this purpose -- a string of bits.
Anyway, it is find first set and essentially:
int ffs(int x) {
int c = 0;
while (!(x&1) ) {
c++;
x>>=1;
}
return c; // except that it handles x = 0 differently
}
This is a common operation for processors to have an instruction for and your compiler will probably generate that instruction rather than calling a function like the one I wrote. x86 has an instruction for this, by the way. Oh, and ffsl and ffsll are the same function except take long and long long, respectively.
You can use & (bitwise and) and << (left shift).
For example, (1 << 3) results in "00001000" in binary. So your code could look like:
char eightBits = 0;
//Set the 5th and 6th bits from the right to 1
eightBits &= (1 << 4);
eightBits &= (1 << 5);
//eightBits now looks like "00110000".
Then just scale it up with an array of chars and figure out the appropriate byte to modify first.
For more efficiency, you could define a list of bitfields in advance and put them in an array:
#define BIT8 0x01
#define BIT7 0x02
#define BIT6 0x04
#define BIT5 0x08
#define BIT4 0x10
#define BIT3 0x20
#define BIT2 0x40
#define BIT1 0x80
char bits[8] = {BIT1, BIT2, BIT3, BIT4, BIT5, BIT6, BIT7, BIT8};
Then you avoid the overhead of the bit shifting and you can index your bits, turning the previous code into:
eightBits &= (bits[3] & bits[4]);
Alternatively, if you can use C++, you could just use an std::vector<bool> which is internally defined as a vector of bits, complete with direct indexing.
bitarray.h:
#include <inttypes.h> // defines uint32_t
//typedef unsigned int bitarray_t; // if you know that int is 32 bits
typedef uint32_t bitarray_t;
#define RESERVE_BITS(n) (((n)+0x1f)>>5)
#define DW_INDEX(x) ((x)>>5)
#define BIT_INDEX(x) ((x)&0x1f)
#define getbit(array,index) (((array)[DW_INDEX(index)]>>BIT_INDEX(index))&1)
#define putbit(array, index, bit) \
((bit)&1 ? ((array)[DW_INDEX(index)] |= 1<<BIT_INDEX(index)) \
: ((array)[DW_INDEX(index)] &= ~(1<<BIT_INDEX(index))) \
, 0 \
)
Use:
bitarray_t arr[RESERVE_BITS(130)] = {0, 0x12345678,0xabcdef0,0xffff0000,0};
int i = getbit(arr,5);
putbit(arr,6,1);
int x=2; // the least significant bit is 0
putbit(arr,6,x); // sets bit 6 to 0 because 2&1 is 0
putbit(arr,6,!!x); // sets bit 6 to 1 because !!2 is 1
EDIT the docs:
"dword" = "double word" = 32-bit value (unsigned, but that's not really important)
RESERVE_BITS: number_of_bits --> number_of_dwords
RESERVE_BITS(n) is the number of 32-bit integers enough to store n bits
DW_INDEX: bit_index_in_array --> dword_index_in_array
DW_INDEX(i) is the index of dword where the i-th bit is stored.
Both bit and dword indexes start from 0.
BIT_INDEX: bit_index_in_array --> bit_index_in_dword
If i is the number of some bit in the array, BIT_INDEX(i) is the number
of that bit in the dword where the bit is stored.
And the dword is known via DW_INDEX().
getbit: bit_array, bit_index_in_array --> bit_value
putbit: bit_array, bit_index_in_array, bit_value --> 0
getbit(array,i) fetches the dword containing the bit i and shifts the dword right, so that the bit i becomes the least significant bit. Then, a bitwise and with 1 clears all other bits.
putbit(array, i, v) first of all checks the least significant bit of v; if it is 0, we have to clear the bit, and if it is 1, we have to set it.
To set the bit, we do a bitwise or of the dword that contains the bit and the value of 1 shifted left by bit_index_in_dword: that bit is set, and other bits do not change.
To clear the bit, we do a bitwise and of the dword that contains the bit and the bitwise complement of 1 shifted left by bit_index_in_dword: that value has all bits set to one except the only zero bit in the position that we want to clear.
The macro ends with , 0 because otherwise it would return the value of dword where the bit i is stored, and that value is not meaningful. One could also use ((void)0).
It's a trade-off:
(1) use 1 byte for each 2 bit value - simple, fast, but uses 4x memory
(2) pack bits into bytes - more complex, some performance overhead, uses minimum memory
If you have enough memory available then go for (1), otherwise consider (2).

Reading characters on a bit level

I would like to be able to enter a character from the keyboard and display the binary code for said key in the format 00000001 for example.
Furthermore i would also like to read the bits in a way that allows me to output if they are true or false.
e.g.
01010101 = false,true,false,true,false,true,false,true
I would post an idea of how i have tried to do it myself but I have absolutely no idea, i'm still experimenting with C and this is my first taste of programming at such a low level scale.
Thankyou
For bit tweaking, it is often safer to use unsigned types, because shifts of signed negative values have an implementation-dependent effect. The plain char can be either signed or unsigned (traditionally, it is unsigned on MacIntosh platforms, but signed on PC). Hence, first cast you character into the unsigned char type.
Then, your friends are the bitwise boolean operators (&, |, ^ and ~) and the shift operators (<< and >>). For instance, if your character is in variable x, then to get the 5th bit you simply use: ((x >> 5) & 1). The shift operators moves the value towards the right, dropping the five lower bits and moving the bit your are interested in the "lowest position" (aka "rightmost"). The bitwise AND with 1 simply sets all other bits to 0, so the resulting value is either 0 or 1, which is your bit. Note here that I number bits from left significant (rightmost) to most significant (leftmost) and I begin with zero, not one.
If you assume that your characters are 8-bits, you could write your code as:
unsigned char x = (unsigned char)your_character;
int i;
for (i = 7; i >= 0; i --) {
if (i != 7)
printf(",");
printf("%s", ((x >> i) & 1) ? "true" : "false");
}
You may note that since I number bits from right to left, but you want output from left to right, the loop index must be decreasing.
Note that according to the C standard, unsigned char has at least eight bits but may have more (nowadays, only a handful of embedded DSP have characters which are not 8-bit). To be extra safe, add this near the beginning of your code (as a top-level declaration):
#include <limits.h>
#if CHAR_BIT != 8
#error I need 8-bit bytes!
#endif
This will prevent successful compilation if the target system happens to be one of those special embedded DSP. As a note on the note, the term "byte" in the C standard means "the elementary memory unit which correspond to an unsigned char", so that, in C-speak, a byte may have more than eight bits (a byte is not always an octet). This is a traditional source of confusion.
This is probably not the safest way - no sanity/size/type checks - but it should still work.
unsigned char myBools[8];
char myChar;
// get your character - this is not safe and you should
// use a better method to obtain input...
// cin >> myChar; <- C++
scanf("%c", &myChar);
// binary AND against each bit in the char and then
// cast the result. anything > 0 should resolve to 'true'
// and == 0 to 'false', but you could add a '> 1' check to be sure.
for(int i = 0; i < 8; ++i)
{
myBools[i] = ( (myChar & (1 << i) > 0) ? 1 : 0 );
}
This will give you an array of unsigned chars - either 0 or 1 (true or false) - for the character.
This code is C89:
/* we need this to use exit */
#include <stdlib.h>
/* we need this to use CHAR_BIT */
#include <limits.h>
/* we need this to use fgetc and printf */
#include <stdio.h>
int main() {
/* Declare everything we need */
int input, index;
unsigned int mask;
char inputchar;
/* an array to store integers telling us the values of the individual bits.
There are (almost) always 8 bits in a char, but it doesn't hurt to get into
good habits early, and in C, the sizes of the basic types are different
on different platforms. CHAR_BIT tells us the number of bits in a byte.
*/
int bits[CHAR_BIT];
/* the simplest way to read a single character is fgetc, but note that
the user will probably have to press "return", since input is generally
buffered */
input = fgetc(stdin);
printf("%d\n", input);
/* Check for errors. In C, we must always check for errors */
if (input == EOF) {
printf("No character read\n");
exit(1);
}
/* convert the value read from type int to type char. Not strictly needed,
we can examine the bits of an int or a char, but here's how it's done.
*/
inputchar = input;
/* the most common way to examine individual bits in a value is to use a
"mask" - in this case we have just 1 bit set, the most significant bit
of a char. */
mask = 1 << (CHAR_BIT - 1);
/* this is a loop, index takes each value from 0 to CHAR_BIT-1 in turn,
and we will read the bits from most significant to least significant. */
for (index = 0; index < CHAR_BIT; ++index) {
/* the bitwise-and operator & is how we use the mask.
"inputchar & mask" will be 0 if the bit corresponding to the mask
is 0, and non-zero if the bit is 1. ?: is the ternary conditional
operator, and in C when you use an integer value in a boolean context,
non-zero values are true. So we're converting any non-zero value to 1.
*/
bits[index] = (inputchar & mask) ? 1 : 0;
/* output what we've done */
printf("index %d, value %u\n", index, inputchar & mask);
/* we need a new mask for the next bit */
mask = mask >> 1;
}
/* output each bit as 0 or 1 */
for (index = 0; index < CHAR_BIT; ++index) {
printf("%d", bits[index]);
}
printf("\n");
/* output each bit as "true" or "false" */
for (index = 0; index < CHAR_BIT; ++index) {
printf(bits[index] ? "true" : "false");
/* fiddly part - we want a comma between each bit, but not at the end */
if (index != CHAR_BIT - 1) printf(",");
}
printf("\n");
return 0;
}
You don't necessarily need three loops - you could combine them together if you wanted, and if you're only doing one of the two kinds of output, then you wouldn't need the array, you could just use each bit value as you mask it off. But I think this keeps things separate and hopefully easier to understand.

Resources