I'm trying to output the right character in utf8 given the following octal sequence \303\255 and \346\234\254, but I don't get the correct output.
#include <stdio.h>
#include <stdlib.h>
int encode(char *buf, unsigned char ch){
if(ch < 0x80) {
*buf++ = (char)ch;
return 1;
}
if(ch < 0x800) {
*buf++ = (ch >> 6) | 0xC0;
*buf++ = (ch & 0x3F) | 0x80;
return 2;
}
if(ch < 0x10000) {
*buf++ = (ch >> 12) | 0xE0;
*buf++ = ((ch >> 6) & 0x3F) | 0x80;
*buf++ = (ch & 0x3F) | 0x80;
return 3;
}
if(ch < 0x110000) {
*buf++ = (ch >> 18) | 0xF0;
*buf++ = ((ch >> 12) & 0x3F) | 0x80;
*buf++ = ((ch >> 6) & 0x3F) | 0x80;
*buf++ = (ch & 0x3F) | 0x80;
return 4;
}
return 0;
}
void output (char *str) {
char *buffer = calloc(8, sizeof(char));
int n = 0;
while(*str) {
n = encode(buffer + n, *str++);
}
printf("%s\n", buffer);
free (buffer);
}
int main() {
char *str1 = "\303\255";
char *str2 = "\346\234\254";
output(str1);
output(str2);
return 0;
}
Outputs: à & æ¬ instead of í & 本
The problem is that the code sequence you use is already UTF-8
/* Both of these are already UTF-8 chars. */
char *str1 = "\303\255";
char *str2 = "\346\234\254";
So your encode function is trying to encode an already encoded UTF-8 which should not work.
When i print these sequences in my UTF-8 enabled terminal i see what you are expecting to see:
$ printf "%s\n" $'\303\255'
í
$ printf "%s\n" $'\346\234\254'
本
So maybe you need to rethink what you are trying to accomplish and post a new question if you have new problems there.
It's a pity, but you cannot compare a char value (being it signed or unsigned) with values over 0x100. You are missing something if you try to convert one byte (iso-8859-1) values to utf-8. The iso-8859-1 characters have the same code values as their UTF counterparts, so the conversion is fairly straightforward, as will be shown below.
First of all, all the iso-8859-1 characters are the same as their UTF counterparts, so the first transformation is the identity: We convert each value in iso-8859-1 to the same value in UTF (look that when I say UTF y mean the UTF code for that character, without using any codification, as when I say UTF-8, which is actually an encoding of UTF in eight bit bytes)
UTF values in the range 0x80...0xff must be encoded with two bytes, the first byte using bits 7 and 6 with pattern 110000xx being xx the two most significant bits of the input code, and followed by a second byte with 10xxxxxx being xxxxxx the six least significant bits (bits 5 to 0) of the input code. For UTF values in the range 0x00...0x7f you encode them with just the same byte as the UTF code.
The following function does preciselly this:
size_t iso2utf( unsigned char *buf, unsigned char iso )
{
size_t res = 0;
if ( iso & 0x80 ) {
*buf++ = 0xc0 | (iso >> 6); /* the 110000xx part */
*buf++ = 0x80 | (iso & 0x3f); /* ... and the 10xxxxxx part. */
res += 2;
} else {
*buf++ = iso; /* a 0xxxxxxx character, untouched. */
res++;
}
*buf = '\0';
return res;
} /* iso2utf */
If you want a complete UTF into UTF-8 encoder, you can try this (I used a different approach, as there can be as much as seven bytes per UTF char ---actually not so much, as currently only 24 or 25 bit codes are used):
#include <string.h>
#include <stdlib.h>
typedef unsigned int UTF; /* you can use wchar_t if you prefer */
typedef unsigned char BYTE;
/* I will assume that UTF string is also zero terminated */
size_t utf_utf8 (BYTE *out, UTF *in)
{
size_t res = 0;
for (;*in;in++) {
UTF c = *in; /* copy the UTF value */
/* we are constructing the string backwards, so finally
* we have it properly ordered. */
size_t n = 0; /* number of characters for this one */
BYTE aux[7], /* buffer to construct the string */
*p = aux + sizeof aux; /* point one cell past the end */
static UTF limits[] = { 0x80, 0x20, 0x10, 0x08, 0x4, 0x2, 0x01};
static UTF masks[] = { 0x00, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc, 0xfe};
for (;c >= limits[n]; c >>= 6) {
*--p = 0x80 | (c & 0x3f); n++;
} /* for */
*--p = masks[n] | c; n++;
memcpy(out, p, n); out += n; res += n;
} /* for */
*out = '\0'; /* terminate string */
return res;
} /* utf_utf8 */
See that the seven bytes per UTF code is hardwired, as it is the fact of UTF codes being 32bit integer. I don't expect UTF codes to go further past the 32 bit limit, but in that case, both, the UTF typedef, and the sizes and contents of the tables aux, limits and masks might be changed accordingly. There's a maximum limit of 7 or 8 for the number of characters used for the utf-8 encoding also, and it's not specified in the standard in any form how to proceed if the UTF codespace should run out of codes any time, so better not to mesh too much with this.
Useless function parameter: unsigned char ch
/// In the following bad code, `if(ch < 0x10000)` is never true
int encode(char *buf, unsigned char ch){
if(ch < 0x80) {
...
return 1;
if(ch < 0x800) {
...
return 2;
if(ch < 0x10000) {
Sorry, GTG.
Note: Code incorrectly does not detect high and low surrogates.
Hi I was wondering if anyone would be able to explain to me what is the best path to take if I wanted to simulate logic gates in a c program?
Lets say for example I create a program and use command line arguments
AND GATE
[console]% yourProgram 11001010 11110000
<console>% 11000000
If anyone could explain to me what the best route is to start with, I would greatly appreciate it. This is the code I have so far...
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char *argv[] ) {
if( argc >= 3){
int result = atoi(argv[1])&&atoi(argv[2]);
printf("Input 1 is %d\n",atoi(argv[1]));
printf("Input 2 is %d\n",atoi(argv[2]));
printf("Result is %c\n",result);
}
return 0;
In addition to the comment suggesting basic corrections, if you want to make it a bit more useful and flexible, you could calculate the most significant bit and then use that to format a simple binary print routine to examine your bitwise operation.
The primary concepts are taking the input as a string of binary digits and converting them to a number with strtoul (base 2), and then following &'ing the inputs together to obtain result it is just a matter of computing how many bytes to print out and whether to format a single byte into nibbles or simply separate multiple bytes.
#include <stdio.h>
#include <stdlib.h>
/* BUILD_64 */
#if defined(__LP64__) || defined(_LP64)
# define BUILD_64 1
#endif
/* BITS_PER_LONG */
#ifdef BUILD_64
# define BITS_PER_LONG 64
#else
# define BITS_PER_LONG 32
#endif
/* CHAR_BIT */
#ifndef CHAR_BIT
# define CHAR_BIT 8
#endif
char *binstrfmt (unsigned long n, unsigned char sz, unsigned char szs, char sep);
static __always_inline unsigned long msbfls (unsigned long word);
int main (int argc, char **argv) {
if ( argc < 3) {
fprintf (stderr, "error: insufficient input. usage: %s b1 b1\n", argv[0]);
return 1;
}
/* input conversion and bitwise operation */
unsigned long b1 = strtoul (argv[1], NULL, 2);
unsigned long b2 = strtoul (argv[2], NULL, 2);
unsigned long result = b1 & b2;
/* variables to use to set binary print format */
unsigned char msb, msbmax, width, sepwidth;
msb = msbmax = width = sepwidth = 0;
/* find the greatest most significant bit */
msbmax = (msb = msbfls (b1)) > msbmax ? msb : msbmax;
msbmax = (msb = msbfls (b2)) > msbmax ? msb : msbmax;
msbmax = (msb = msbfls (result)) > msbmax ? msb : msbmax;
msbmax = msbmax ? msbmax : 1;
/* set the number of bytes to print and the separator width */
width = (msbmax / CHAR_BIT + 1) * CHAR_BIT;
sepwidth = width > CHAR_BIT ? CHAR_BIT : CHAR_BIT/2;
/* print the output */
printf("\n Input 1 : %s\n", binstrfmt (b1, width, sepwidth, '-'));
printf(" Input 2 : %s\n", binstrfmt (b2, width, sepwidth, '-'));
printf(" Result : %s\n\n", binstrfmt (result, width, sepwidth, '-'));
return 0;
}
/** returns pointer to formatted binary representation of 'n' zero padded to 'sz'.
* returns pointer to string contianing formatted binary representation of
* unsigned 64-bit (or less ) value zero padded to 'sz' digits with char
* 'sep' placed every 'szs' digits. (e.g. 10001010 -> 1000-1010).
*/
char *binstrfmt (unsigned long n, unsigned char sz, unsigned char szs, char sep) {
static char s[2 * BITS_PER_LONG + 1] = {0};
char *p = s + 2 * BITS_PER_LONG;
unsigned char i;
for (i = 0; i < sz; i++) {
p--;
if (i > 0 && szs > 0 && i % szs == 0)
*p-- = sep;
*p = (n >> i & 1) ? '1' : '0';
}
return p;
}
/* return the most significant bit (MSB) for the value supplied. */
static __always_inline unsigned long msbfls(unsigned long word)
{
if (!word) return 0;
int num = BITS_PER_LONG - 1;
#if BITS_PER_LONG == 64
if (!(word & (~0ul << 32))) {
num -= 32;
word <<= 32;
}
#endif
if (!(word & (~0ul << (BITS_PER_LONG-16)))) {
num -= 16;
word <<= 16;
}
if (!(word & (~0ul << (BITS_PER_LONG-8)))) {
num -= 8;
word <<= 8;
}
if (!(word & (~0ul << (BITS_PER_LONG-4)))) {
num -= 4;
word <<= 4;
}
if (!(word & (~0ul << (BITS_PER_LONG-2)))) {
num -= 2;
word <<= 2;
}
if (!(word & (~0ul << (BITS_PER_LONG-1))))
num -= 1;
return num;
}
Example Output
$ ./bin/andargs 11001010 11110000
Input 1 : 1100-1010
Input 2 : 1111-0000
Result : 1100-0000
$ ./bin/andargs 1100101011110000 1111000011001010
Input 1 : 11001010-11110000
Input 2 : 11110000-11001010
Result : 11000000-11000000
Use this code. (for AND operation):
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char *argv[] ) {
if( argc >= 3){
int i=0;
printf("1st i/p = %s\n2nd i/p = %s\n",argv[1],argv[2]);
for (i=0; argv[1][i]!='\0'; i++){ //this assumes there are 2 inputs, of equal size, having bits(1,0) as its digits
argv[1][i] = argv[1][i] & argv[2][i]; //modifies argv[1] to your required answer
}
printf("Answer: %s\n",argv[1]);
}
return 0;
}
I have an array of values all well within the range 0 - 63, and decided I could pack every 4 bytes into 3 because the values only require 6 bits and I could use the extra 2bits to store the first 2 bits of the next value and so on.
Having never done this before I used the switch statement and a nextbit variable (a state machine like device) to do the packing and keep track of the starting bit. I'm convinced however, there must be a better way.
Suggestions/clues please, but don't ruin my fun ;-)
Any portability problems regarding big/little endian?
btw: I have verified this code is working, by unpacking it again and comparing with the input. And no it ain't homework, just an exercise I've set myself.
/* build with gcc -std=c99 -Wconversion */
#define ASZ 400
typedef unsigned char uc_;
uc_ data[ASZ];
int i;
for (i = 0; i < ASZ; ++i) {
data[i] = (uc_)(i % 0x40);
}
size_t dl = sizeof(data);
printf("sizeof(data):%z\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%z\n",pl);
for (i = 0; i < dl; ++i)
printf("%02d ", data[i]);
printf("\n");
uc_ * packeddata = calloc(pl, sizeof(uc_));
uc_ * byte = packeddata;
uc_ nextbit = 1;
for (int i = 0; i < dl; ++i) {
uc_ m = (uc_)(data[i] & 0x3f);
switch(nextbit) {
case 1:
/* all 6 bits of m into first 6 bits of byte: */
*byte = m;
nextbit = 7;
break;
case 3:
/* all 6 bits of m into last 6 bits of byte: */
*byte++ = (uc_)(*byte | (m << 2));
nextbit = 1;
break;
case 5:
/* 1st 4 bits of m into last 4 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x0f) << 4));
/* 5th and 6th bits of m into 1st and 2nd bits of byte: */
*byte = (uc_)(*byte | ((m & 0x30) >> 4));
nextbit = 3;
break;
case 7:
/* 1st 2 bits of m into last 2 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x03) << 6));
/* next (last) 4 bits of m into 1st 4 bits of byte: */
*byte = (uc_)((m & 0x3c) >> 2);
nextbit = 5;
break;
}
}
So, this is kinda like code-golf, right?
#include <stdlib.h>
#include <string.h>
static void pack2(unsigned char *r, unsigned char *n) {
unsigned v = n[0] + (n[1] << 6) + (n[2] << 12) + (n[3] << 18);
*r++ = v;
*r++ = v >> 8;
*r++ = v >> 16;
}
unsigned char *apack(const unsigned char *s, int len) {
unsigned char *s_end = s + len,
*r, *result = malloc(len/4*3+3),
lastones[4] = { 0 };
if (result == NULL)
return NULL;
for(r = result; s + 4 <= s_end; s += 4, r += 3)
pack2(r, s);
memcpy(lastones, s, s_end - s);
pack2(r, lastones);
return result;
}
Check out the IETF RFC 4648 for 'The Base16, Base32 and Base64 Data Encodings'.
Partial code critique:
size_t dl = sizeof(data);
printf("sizeof(data):%d\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%d\n",pl);
Don't use the floating point stuff - just use integers. And use '%z' to print 'size_t' values - assuming you've got a C99 library.
size_t pl = ((dl + 3) / 4) * 3;
I think your loop could be simplified by dealing with 3-byte input units until you've got a partial unit left over, and then dealing with a remainder of 1 or 2 bytes as special cases. I note that the standard referenced says that you use one or two '=' signs to pad at the end.
I have a Base64 encoder and decode which does some of that. You are describing the 'decode' part of Base64 -- where the Base64 code has 4 bytes of data that should be stored in just 3 - as your packing code. The Base64 encoder corresponds to the unpacker you will need.
Base-64 Decoder
Note: base_64_inv is an array of 256 values, one for each possible input byte value; it defines the correct decoded value for each encoded byte. In the Base64 encoding, this is a sparse array - 3/4 zeroes. Similarly, base_64_map is the mapping between a value 0..63 and the corresponding storage value.
enum { DC_PAD = -1, DC_ERR = -2 };
static int decode_b64(int c)
{
int b64 = base_64_inv[c];
if (c == base64_pad)
b64 = DC_PAD;
else if (b64 == 0 && c != base_64_map[0])
b64 = DC_ERR;
return(b64);
}
/* Decode 4 bytes into 3 */
static int decode_quad(const char *b64_data, char *bin_data)
{
int b0 = decode_b64(b64_data[0]);
int b1 = decode_b64(b64_data[1]);
int b2 = decode_b64(b64_data[2]);
int b3 = decode_b64(b64_data[3]);
int bytes;
if (b0 < 0 || b1 < 0 || b2 == DC_ERR || b3 == DC_ERR || (b2 == DC_PAD && b3 != DC_PAD))
return(B64_ERR_INVALID_ENCODED_DATA);
if (b2 == DC_PAD && (b1 & 0x0F) != 0)
/* 3rd byte is '='; 2nd byte must end with 4 zero bits */
return(B64_ERR_INVALID_TRAILING_BYTE);
if (b2 >= 0 && b3 == DC_PAD && (b2 & 0x03) != 0)
/* 4th byte is '='; 3rd byte is not '=' and must end with 2 zero bits */
return(B64_ERR_INVALID_TRAILING_BYTE);
bin_data[0] = (b0 << 2) | (b1 >> 4);
bytes = 1;
if (b2 >= 0)
{
bin_data[1] = ((b1 & 0x0F) << 4) | (b2 >> 2);
bytes = 2;
}
if (b3 >= 0)
{
bin_data[2] = ((b2 & 0x03) << 6) | (b3);
bytes = 3;
}
return(bytes);
}
/* Decode input Base-64 string to original data. Output length returned, or negative error */
int base64_decode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
size_t outlen = 0;
if (datalen % 4 != 0)
return(B64_ERR_INVALID_ENCODED_LENGTH);
if (BASE64_DECLENGTH(datalen) > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 4)
{
int nbytes = decode_quad(data, buffer + outlen);
if (nbytes < 0)
return(nbytes);
outlen += nbytes;
data += 4;
datalen -= 4;
}
assert(datalen == 0); /* By virtue of the %4 check earlier */
return(outlen);
}
Base-64 Encoder
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
I complicate life by having to deal with a product that uses a variant alphabet for the Base64 encoding, and also manages not to pad data - hence the 'pad' argument (which can be zero for 'null padding' or '=' for standard padding. The 'base_64_map' array contains the alphabet to use for 6-bit values in the range 0..63.
Another simpler way to do it would be to use bit fields. One of the lesser known corners of C struct syntax is the big field. Let's say you have the following structure:
struct packed_bytes {
byte chunk1 : 6;
byte chunk2 : 6;
byte chunk3 : 6;
byte chunk4 : 6;
};
This declares chunk1, chunk2, chunk3, and chunk4 to have the type byte but to only take up 6 bits in the structure. The result is that sizeof(struct packed_bytes) == 3. Now all you need is a little function to take your array and dump it into the structure like so:
void
dump_to_struct(byte *in, struct packed_bytes *out, int count)
{
int i, j;
for (i = 0; i < (count / 4); ++i) {
out[i].chunk1 = in[i * 4];
out[i].chunk2 = in[i * 4 + 1];
out[i].chunk3 = in[i * 4 + 2];
out[i].chunk4 = in[i * 4 + 3];
}
// Finish up
switch(struct % 4) {
case 3:
out[count / 4].chunk3 = in[(count / 4) * 4 + 2];
case 2:
out[count / 4].chunk2 = in[(count / 4) * 4 + 1];
case 1:
out[count / 4].chunk1 = in[(count / 4) * 4];
}
}
There you go, you now have an array of struct packed_bytes that you can easily read by using the above struct.
Instead of using a statemachine you can simply use a counter for how many bits are already used in the current byte, from which you can directly derive the shift-offsets and whether or not you overflow into the next byte.
Regarding the endianess: As long as you use only a single datatype (that is you don't reinterpret pointer to types of different size (e.g. int* a =...;short* b=(short*) a;) you shouldn't get problems with endianess in most cases
Taking elements of DigitalRoss's compact code, Grizzly's suggestion, and my own code, I have written my own answer at last. Although DigitalRoss provides a usable working answer, my usage of it without understanding, would not have provided the same satisfaction as to learning something. For this reason I have chosen to base my answer on my original code.
I have also chosen to ignore the advice Jonathon Leffler gives to avoid using floating point arithmetic for the calculation of the packed data length. Both the recommended method given - the same DigitalRoss also uses, increases the length of the packed data by as much as three bytes. Granted this is not much, but is also avoidable by the use of floating point math.
Here is the code, criticisms welcome:
/* built with gcc -std=c99 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned char *
pack(const unsigned char * data, size_t len, size_t * packedlen)
{
float fpl = ((float)len / 4.0f) * 3.0f;
*packedlen = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
unsigned char * packed = malloc(*packedlen);
if (!packed)
return 0;
const unsigned char * in = data;
const unsigned char * in_end = in + len;
unsigned char * out;
for (out = packed; in + 4 <= in_end; in += 4) {
*out++ = in[0] | ((in[1] & 0x03) << 6);
*out++ = ((in[1] & 0x3c) >> 2) | ((in[2] & 0x0f) << 4);
*out++ = ((in[2] & 0x30) >> 4) | (in[3] << 2);
}
size_t lastlen = in_end - in;
if (lastlen > 0) {
*out = in[0];
if (lastlen > 1) {
*out++ |= ((in[1] & 0x03) << 6);
*out = ((in[1] & 0x3c) >> 2);
if (lastlen > 2) {
*out++ |= ((in[2] & 0x0f) << 4);
*out = ((in[2] & 0x30) >> 4);
if (lastlen > 3)
*out |= (in[3] << 2);
}
}
}
return packed;
}
int main()
{
size_t i;
unsigned char data[] = {
12, 15, 40, 18,
26, 32, 50, 3,
7, 19, 46, 10,
25, 37, 2, 39,
60, 59, 0, 17,
9, 29, 13, 54,
5, 6, 47, 32
};
size_t datalen = sizeof(data);
printf("unpacked datalen: %td\nunpacked data\n", datalen);
for (i = 0; i < datalen; ++i)
printf("%02d ", data[i]);
printf("\n");
size_t packedlen;
unsigned char * packed = pack(data, sizeof(data), &packedlen);
if (!packed) {
fprintf(stderr, "Packing failed!\n");
return EXIT_FAILURE;
}
printf("packedlen: %td\npacked data\n", packedlen);
for (i = 0; i < packedlen; ++i)
printf("0x%02x ", packed[i]);
printf("\n");
free(packed);
return EXIT_SUCCESS;
}
I can print with printf as a hex or octal number. Is there a format tag to print as binary, or arbitrary base?
I am running gcc.
printf("%d %x %o\n", 10, 10, 10); //prints "10 A 12\n"
printf("%b\n", 10); // prints "%b\n"
Hacky but works for me:
#define BYTE_TO_BINARY_PATTERN "%c%c%c%c%c%c%c%c"
#define BYTE_TO_BINARY(byte) \
(byte & 0x80 ? '1' : '0'), \
(byte & 0x40 ? '1' : '0'), \
(byte & 0x20 ? '1' : '0'), \
(byte & 0x10 ? '1' : '0'), \
(byte & 0x08 ? '1' : '0'), \
(byte & 0x04 ? '1' : '0'), \
(byte & 0x02 ? '1' : '0'), \
(byte & 0x01 ? '1' : '0')
printf("Leading text "BYTE_TO_BINARY_PATTERN, BYTE_TO_BINARY(byte));
For multi-byte types
printf("m: "BYTE_TO_BINARY_PATTERN" "BYTE_TO_BINARY_PATTERN"\n",
BYTE_TO_BINARY(m>>8), BYTE_TO_BINARY(m));
You need all the extra quotes unfortunately. This approach has the efficiency risks of macros (don't pass a function as the argument to BYTE_TO_BINARY) but avoids the memory issues and multiple invocations of strcat in some of the other proposals here.
Print Binary for Any Datatype
// Assumes little endian
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
for (i = size-1; i >= 0; i--) {
for (j = 7; j >= 0; j--) {
byte = (b[i] >> j) & 1;
printf("%u", byte);
}
}
puts("");
}
Test:
int main(int argv, char* argc[])
{
int i = 23;
uint ui = UINT_MAX;
float f = 23.45f;
printBits(sizeof(i), &i);
printBits(sizeof(ui), &ui);
printBits(sizeof(f), &f);
return 0;
}
Here is a quick hack to demonstrate techniques to do what you want.
#include <stdio.h> /* printf */
#include <string.h> /* strcat */
#include <stdlib.h> /* strtol */
const char *byte_to_binary
(
int x
)
{
static char b[9];
b[0] = '\0';
int z;
for (z = 128; z > 0; z >>= 1)
{
strcat(b, ((x & z) == z) ? "1" : "0");
}
return b;
}
int main
(
void
)
{
{
/* binary string to int */
char *tmp;
char *b = "0101";
printf("%d\n", strtol(b, &tmp, 2));
}
{
/* byte to binary string */
printf("%s\n", byte_to_binary(5));
}
return 0;
}
There isn't a binary conversion specifier in glibc normally.
It is possible to add custom conversion types to the printf() family of functions in glibc. See register_printf_function for details. You could add a custom %b conversion for your own use, if it simplifies the application code to have it available.
Here is an example of how to implement a custom printf formats in glibc.
You could use a small table to improve speed1. Similar techniques are useful in the embedded world, for example, to invert a byte:
const char *bit_rep[16] = {
[ 0] = "0000", [ 1] = "0001", [ 2] = "0010", [ 3] = "0011",
[ 4] = "0100", [ 5] = "0101", [ 6] = "0110", [ 7] = "0111",
[ 8] = "1000", [ 9] = "1001", [10] = "1010", [11] = "1011",
[12] = "1100", [13] = "1101", [14] = "1110", [15] = "1111",
};
void print_byte(uint8_t byte)
{
printf("%s%s", bit_rep[byte >> 4], bit_rep[byte & 0x0F]);
}
1 I'm mostly referring to embedded applications where optimizers are not so aggressive and the speed difference is visible.
Print the least significant bit and shift it out on the right. Doing this until the integer becomes zero prints the binary representation without leading zeros but in reversed order. Using recursion, the order can be corrected quite easily.
#include <stdio.h>
void print_binary(unsigned int number)
{
if (number >> 1) {
print_binary(number >> 1);
}
putc((number & 1) ? '1' : '0', stdout);
}
To me, this is one of the cleanest solutions to the problem. If you like 0b prefix and a trailing new line character, I suggest wrapping the function.
Online demo
Based on #William Whyte's answer, this is a macro that provides int8,16,32 & 64 versions, reusing the INT8 macro to avoid repetition.
/* --- PRINTF_BYTE_TO_BINARY macro's --- */
#define PRINTF_BINARY_PATTERN_INT8 "%c%c%c%c%c%c%c%c"
#define PRINTF_BYTE_TO_BINARY_INT8(i) \
(((i) & 0x80ll) ? '1' : '0'), \
(((i) & 0x40ll) ? '1' : '0'), \
(((i) & 0x20ll) ? '1' : '0'), \
(((i) & 0x10ll) ? '1' : '0'), \
(((i) & 0x08ll) ? '1' : '0'), \
(((i) & 0x04ll) ? '1' : '0'), \
(((i) & 0x02ll) ? '1' : '0'), \
(((i) & 0x01ll) ? '1' : '0')
#define PRINTF_BINARY_PATTERN_INT16 \
PRINTF_BINARY_PATTERN_INT8 PRINTF_BINARY_PATTERN_INT8
#define PRINTF_BYTE_TO_BINARY_INT16(i) \
PRINTF_BYTE_TO_BINARY_INT8((i) >> 8), PRINTF_BYTE_TO_BINARY_INT8(i)
#define PRINTF_BINARY_PATTERN_INT32 \
PRINTF_BINARY_PATTERN_INT16 PRINTF_BINARY_PATTERN_INT16
#define PRINTF_BYTE_TO_BINARY_INT32(i) \
PRINTF_BYTE_TO_BINARY_INT16((i) >> 16), PRINTF_BYTE_TO_BINARY_INT16(i)
#define PRINTF_BINARY_PATTERN_INT64 \
PRINTF_BINARY_PATTERN_INT32 PRINTF_BINARY_PATTERN_INT32
#define PRINTF_BYTE_TO_BINARY_INT64(i) \
PRINTF_BYTE_TO_BINARY_INT32((i) >> 32), PRINTF_BYTE_TO_BINARY_INT32(i)
/* --- end macros --- */
#include <stdio.h>
int main() {
long long int flag = 1648646756487983144ll;
printf("My Flag "
PRINTF_BINARY_PATTERN_INT64 "\n",
PRINTF_BYTE_TO_BINARY_INT64(flag));
return 0;
}
This outputs:
My Flag 0001011011100001001010110111110101111000100100001111000000101000
For readability you may want to add a separator for eg:
My Flag 00010110,11100001,00101011,01111101,01111000,10010000,11110000,00101000
As of February 3rd, 2022, the GNU C Library been updated to version 2.35. As a result, %b is now supported to output in binary format.
printf-family functions now support the %b format for output of
integers in binary, as specified in draft ISO C2X, and the %B variant
of that format recommended by draft ISO C2X.
Here's a version of the function that does not suffer from reentrancy issues or limits on the size/type of the argument:
#define FMT_BUF_SIZE (CHAR_BIT*sizeof(uintmax_t)+1)
char *binary_fmt(uintmax_t x, char buf[static FMT_BUF_SIZE])
{
char *s = buf + FMT_BUF_SIZE;
*--s = 0;
if (!x) *--s = '0';
for (; x; x /= 2) *--s = '0' + x%2;
return s;
}
Note that this code would work just as well for any base between 2 and 10 if you just replace the 2's by the desired base. Usage is:
char tmp[FMT_BUF_SIZE];
printf("%s\n", binary_fmt(x, tmp));
Where x is any integral expression.
Quick and easy solution:
void printbits(my_integer_type x)
{
for(int i=sizeof(x)<<3; i; i--)
putchar('0'+((x>>(i-1))&1));
}
Works for any size type and for signed and unsigned ints. The '&1' is needed to handle signed ints as the shift may do sign extension.
There are so many ways of doing this. Here's a super simple one for printing 32 bits or n bits from a signed or unsigned 32 bit type (not putting a negative if signed, just printing the actual bits) and no carriage return. Note that i is decremented before the bit shift:
#define printbits_n(x,n) for (int i=n;i;i--,putchar('0'|(x>>i)&1))
#define printbits_32(x) printbits_n(x,32)
What about returning a string with the bits to store or print later? You either can allocate the memory and return it and the user has to free it, or else you return a static string but it will get clobbered if it's called again, or by another thread. Both methods shown:
char *int_to_bitstring_alloc(int x, int count)
{
count = count<1 ? sizeof(x)*8 : count;
char *pstr = malloc(count+1);
for(int i = 0; i<count; i++)
pstr[i] = '0' | ((x>>(count-1-i))&1);
pstr[count]=0;
return pstr;
}
#define BITSIZEOF(x) (sizeof(x)*8)
char *int_to_bitstring_static(int x, int count)
{
static char bitbuf[BITSIZEOF(x)+1];
count = (count<1 || count>BITSIZEOF(x)) ? BITSIZEOF(x) : count;
for(int i = 0; i<count; i++)
bitbuf[i] = '0' | ((x>>(count-1-i))&1);
bitbuf[count]=0;
return bitbuf;
}
Call with:
// memory allocated string returned which needs to be freed
char *pstr = int_to_bitstring_alloc(0x97e50ae6, 17);
printf("bits = 0b%s\n", pstr);
free(pstr);
// no free needed but you need to copy the string to save it somewhere else
char *pstr2 = int_to_bitstring_static(0x97e50ae6, 17);
printf("bits = 0b%s\n", pstr2);
Is there a printf converter to print in binary format?
The printf() family is only able to print integers in base 8, 10, and 16 using the standard specifiers directly. I suggest creating a function that converts the number to a string per code's particular needs.
[Edit 2022] This is expected to change with the next version of C which implements "%b".
Binary constants such as 0b10101010, and %b conversion specifier for printf() function family C2x
To print in any base [2-36]
All other answers so far have at least one of these limitations.
Use static memory for the return buffer. This limits the number of times the function may be used as an argument to printf().
Allocate memory requiring the calling code to free pointers.
Require the calling code to explicitly provide a suitable buffer.
Call printf() directly. This obliges a new function for to fprintf(), sprintf(), vsprintf(), etc.
Use a reduced integer range.
The following has none of the above limitation. It does require C99 or later and use of "%s". It uses a compound literal to provide the buffer space. It has no trouble with multiple calls in a printf().
#include <assert.h>
#include <limits.h>
#define TO_BASE_N (sizeof(unsigned)*CHAR_BIT + 1)
// v--compound literal--v
#define TO_BASE(x, b) my_to_base((char [TO_BASE_N]){""}, (x), (b))
// Tailor the details of the conversion function as needed
// This one does not display unneeded leading zeros
// Use return value, not `buf`
char *my_to_base(char buf[TO_BASE_N], unsigned i, int base) {
assert(base >= 2 && base <= 36);
char *s = &buf[TO_BASE_N - 1];
*s = '\0';
do {
s--;
*s = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[i % base];
i /= base;
} while (i);
// Could employ memmove here to move the used buffer to the beginning
// size_t len = &buf[TO_BASE_N] - s;
// memmove(buf, s, len);
return s;
}
#include <stdio.h>
int main(void) {
int ip1 = 0x01020304;
int ip2 = 0x05060708;
printf("%s %s\n", TO_BASE(ip1, 16), TO_BASE(ip2, 16));
printf("%s %s\n", TO_BASE(ip1, 2), TO_BASE(ip2, 2));
puts(TO_BASE(ip1, 8));
puts(TO_BASE(ip1, 36));
return 0;
}
Output
1020304 5060708
1000000100000001100000100 101000001100000011100001000
100401404
A2F44
const char* byte_to_binary(int x)
{
static char b[sizeof(int)*8+1] = {0};
int y;
long long z;
for (z = 1LL<<sizeof(int)*8-1, y = 0; z > 0; z >>= 1, y++) {
b[y] = (((x & z) == z) ? '1' : '0');
}
b[y] = 0;
return b;
}
None of the previously posted answers are exactly what I was looking for, so I wrote one. It is super simple to use %B with the printf!
/*
* File: main.c
* Author: Techplex.Engineer
*
* Created on February 14, 2012, 9:16 PM
*/
#include <stdio.h>
#include <stdlib.h>
#include <printf.h>
#include <math.h>
#include <string.h>
static int printf_arginfo_M(const struct printf_info *info, size_t n, int *argtypes)
{
/* "%M" always takes one argument, a pointer to uint8_t[6]. */
if (n > 0) {
argtypes[0] = PA_POINTER;
}
return 1;
}
static int printf_output_M(FILE *stream, const struct printf_info *info, const void *const *args)
{
int value = 0;
int len;
value = *(int **) (args[0]);
// Beginning of my code ------------------------------------------------------------
char buffer [50] = ""; // Is this bad?
char buffer2 [50] = ""; // Is this bad?
int bits = info->width;
if (bits <= 0)
bits = 8; // Default to 8 bits
int mask = pow(2, bits - 1);
while (mask > 0) {
sprintf(buffer, "%s", ((value & mask) > 0 ? "1" : "0"));
strcat(buffer2, buffer);
mask >>= 1;
}
strcat(buffer2, "\n");
// End of my code --------------------------------------------------------------
len = fprintf(stream, "%s", buffer2);
return len;
}
int main(int argc, char** argv)
{
register_printf_specifier('B', printf_output_M, printf_arginfo_M);
printf("%4B\n", 65);
return EXIT_SUCCESS;
}
This code should handle your needs up to 64 bits.
I created two functions: pBin and pBinFill. Both do the same thing, but pBinFill fills in the leading spaces with the fill character provided by its last argument.
The test function generates some test data, then prints it out using the pBinFill function.
#define kDisplayWidth 64
char* pBin(long int x,char *so)
{
char s[kDisplayWidth+1];
int i = kDisplayWidth;
s[i--] = 0x00; // terminate string
do { // fill in array from right to left
s[i--] = (x & 1) ? '1' : '0'; // determine bit
x >>= 1; // shift right 1 bit
} while (x > 0);
i++; // point to last valid character
sprintf(so, "%s", s+i); // stick it in the temp string string
return so;
}
char* pBinFill(long int x, char *so, char fillChar)
{
// fill in array from right to left
char s[kDisplayWidth+1];
int i = kDisplayWidth;
s[i--] = 0x00; // terminate string
do { // fill in array from right to left
s[i--] = (x & 1) ? '1' : '0';
x >>= 1; // shift right 1 bit
} while (x > 0);
while (i >= 0) s[i--] = fillChar; // fill with fillChar
sprintf(so, "%s", s);
return so;
}
void test()
{
char so[kDisplayWidth+1]; // working buffer for pBin
long int val = 1;
do {
printf("%ld =\t\t%#lx =\t\t0b%s\n", val, val, pBinFill(val, so, '0'));
val *= 11; // generate test data
} while (val < 100000000);
}
Output:
00000001 = 0x000001 = 0b00000000000000000000000000000001
00000011 = 0x00000b = 0b00000000000000000000000000001011
00000121 = 0x000079 = 0b00000000000000000000000001111001
00001331 = 0x000533 = 0b00000000000000000000010100110011
00014641 = 0x003931 = 0b00000000000000000011100100110001
00161051 = 0x02751b = 0b00000000000000100111010100011011
01771561 = 0x1b0829 = 0b00000000000110110000100000101001
19487171 = 0x12959c3 = 0b00000001001010010101100111000011
Some runtimes support "%b" although that is not a standard.
Also see here for an interesting discussion:
http://bytes.com/forum/thread591027.html
HTH
Maybe a bit OT, but if you need this only for debuging to understand or retrace some binary operations you are doing, you might take a look on wcalc (a simple console calculator). With the -b options you get binary output.
e.g.
$ wcalc -b "(256 | 3) & 0xff"
= 0b11
There is no formatting function in the C standard library to output binary like that. All the format operations the printf family supports are towards human readable text.
The following recursive function might be useful:
void bin(int n)
{
/* Step 1 */
if (n > 1)
bin(n/2);
/* Step 2 */
printf("%d", n % 2);
}
I optimized the top solution for size and C++-ness, and got to this solution:
inline std::string format_binary(unsigned int x)
{
static char b[33];
b[32] = '\0';
for (int z = 0; z < 32; z++) {
b[31-z] = ((x>>z) & 0x1) ? '1' : '0';
}
return b;
}
Use:
char buffer [33];
itoa(value, buffer, 2);
printf("\nbinary: %s\n", buffer);
For more ref., see How to print binary number via printf.
void
print_binary(unsigned int n)
{
unsigned int mask = 0;
/* this grotesque hack creates a bit pattern 1000... */
/* regardless of the size of an unsigned int */
mask = ~mask ^ (~mask >> 1);
for(; mask != 0; mask >>= 1) {
putchar((n & mask) ? '1' : '0');
}
}
Print bits from any type using less code and resources
This approach has as attributes:
Works with variables and literals.
Doesn't iterate all bits when not necessary.
Call printf only when complete a byte (not unnecessarily for all bits).
Works for any type.
Works with little and big endianness (uses GCC #defines for checking).
May work with hardware that char isn't a byte (eight bits). (Tks #supercat)
Uses typeof() that isn't C standard but is largely defined.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <limits.h>
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define for_endian(size) for (int i = 0; i < size; ++i)
#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define for_endian(size) for (int i = size - 1; i >= 0; --i)
#else
#error "Endianness not detected"
#endif
#define printb(value) \
({ \
typeof(value) _v = value; \
__printb((typeof(_v) *) &_v, sizeof(_v)); \
})
#define MSB_MASK 1 << (CHAR_BIT - 1)
void __printb(void *value, size_t size)
{
unsigned char uc;
unsigned char bits[CHAR_BIT + 1];
bits[CHAR_BIT] = '\0';
for_endian(size) {
uc = ((unsigned char *) value)[i];
memset(bits, '0', CHAR_BIT);
for (int j = 0; uc && j < CHAR_BIT; ++j) {
if (uc & MSB_MASK)
bits[j] = '1';
uc <<= 1;
}
printf("%s ", bits);
}
printf("\n");
}
int main(void)
{
uint8_t c1 = 0xff, c2 = 0x44;
uint8_t c3 = c1 + c2;
printb(c1);
printb((char) 0xff);
printb((short) 0xff);
printb(0xff);
printb(c2);
printb(0x44);
printb(0x4411ff01);
printb((uint16_t) c3);
printb('A');
printf("\n");
return 0;
}
Output
$ ./printb
11111111
11111111
00000000 11111111
00000000 00000000 00000000 11111111
01000100
00000000 00000000 00000000 01000100
01000100 00010001 11111111 00000001
00000000 01000011
00000000 00000000 00000000 01000001
I have used another approach (bitprint.h) to fill a table with all bytes (as bit strings) and print them based on the input/index byte. It's worth taking a look.
Maybe someone will find this solution useful:
void print_binary(int number, int num_digits) {
int digit;
for(digit = num_digits - 1; digit >= 0; digit--) {
printf("%c", number & (1 << digit) ? '1' : '0');
}
}
void print_ulong_bin(const unsigned long * const var, int bits) {
int i;
#if defined(__LP64__) || defined(_LP64)
if( (bits > 64) || (bits <= 0) )
#else
if( (bits > 32) || (bits <= 0) )
#endif
return;
for(i = 0; i < bits; i++) {
printf("%lu", (*var >> (bits - 1 - i)) & 0x01);
}
}
should work - untested.
I liked the code by paniq, the static buffer is a good idea. However it fails if you want multiple binary formats in a single printf() because it always returns the same pointer and overwrites the array.
Here's a C style drop-in that rotates pointer on a split buffer.
char *
format_binary(unsigned int x)
{
#define MAXLEN 8 // width of output format
#define MAXCNT 4 // count per printf statement
static char fmtbuf[(MAXLEN+1)*MAXCNT];
static int count = 0;
char *b;
count = count % MAXCNT + 1;
b = &fmtbuf[(MAXLEN+1)*count];
b[MAXLEN] = '\0';
for (int z = 0; z < MAXLEN; z++) { b[MAXLEN-1-z] = ((x>>z) & 0x1) ? '1' : '0'; }
return b;
}
Here is a small variation of paniq's solution that uses templates to allow printing of 32 and 64 bit integers:
template<class T>
inline std::string format_binary(T x)
{
char b[sizeof(T)*8+1] = {0};
for (size_t z = 0; z < sizeof(T)*8; z++)
b[sizeof(T)*8-1-z] = ((x>>z) & 0x1) ? '1' : '0';
return std::string(b);
}
And can be used like:
unsigned int value32 = 0x1e127ad;
printf( " 0x%x: %s\n", value32, format_binary(value32).c_str() );
unsigned long long value64 = 0x2e0b04ce0;
printf( "0x%llx: %s\n", value64, format_binary(value64).c_str() );
Here is the result:
0x1e127ad: 00000001111000010010011110101101
0x2e0b04ce0: 0000000000000000000000000000001011100000101100000100110011100000
No standard and portable way.
Some implementations provide itoa(), but it's not going to be in most, and it has a somewhat crummy interface. But the code is behind the link and should let you implement your own formatter pretty easily.
I just want to post my solution. It's used to get zeroes and ones of one byte, but calling this function few times can be used for larger data blocks. I use it for 128 bit or larger structs. You can also modify it to use size_t as input parameter and pointer to data you want to print, so it can be size independent. But it works for me quit well as it is.
void print_binary(unsigned char c)
{
unsigned char i1 = (1 << (sizeof(c)*8-1));
for(; i1; i1 >>= 1)
printf("%d",(c&i1)!=0);
}
void get_binary(unsigned char c, unsigned char bin[])
{
unsigned char i1 = (1 << (sizeof(c)*8-1)), i2=0;
for(; i1; i1>>=1, i2++)
bin[i2] = ((c&i1)!=0);
}
Here's how I did it for an unsigned int
void printb(unsigned int v) {
unsigned int i, s = 1<<((sizeof(v)<<3)-1); // s = only most significant bit at 1
for (i = s; i; i>>=1) printf("%d", v & i || 0 );
}
One statement generic conversion of any integral type into the binary string representation using standard library:
#include <bitset>
MyIntegralType num = 10;
print("%s\n",
std::bitset<sizeof(num) * 8>(num).to_string().insert(0, "0b").c_str()
); // prints "0b1010\n"
Or just: std::cout << std::bitset<sizeof(num) * 8>(num);