Number of character cells used by string - c

I have a program that outputs a textual table using UTF-8 strings, and I need to measure the number of monospaced character cells used by a string so I can align it properly. If possible, I'd like to do this with standard functions.

From UTF-8 and Unicode FAQ for Unix/Linux:
The number of characters can be counted in C in a portable way using mbstowcs(NULL,s,0). This works for UTF-8 like for any other supported encoding, as long as the appropriate locale has been selected. A hard-wired technique to count the number of characters in a UTF-8 string is to count all bytes except those in the range 0x80 – 0xBF, because these are just continuation bytes and not characters of their own. However, the need to count characters arises surprisingly rarely in applications.

You may or may not have a UTF-8 compatible strlen(3) function available. However, there are some simple C functions readily available that do the job quickly.
The efficient C solutions examine the start of the character to skip continuation bytes. The simple code (referenced from the link above) is
int my_strlen_utf8_c(char *s) {
int i = 0, j = 0;
while (s[i]) {
if ((s[i] & 0xc0) != 0x80) j++;
i++;
}
return j;
}
The faster version uses the same technique, but prefetches data and does multi-byte compares, resulting is a substantial speedup. The code is longer and more complex, however.

I'm shocked that no one mentioned this, so here it goes for the record:
If you want to align text in a terminal, you need to use the POSIX functions wcwidth and wcswidth. Here's correct program to find the on-screen length of a string.
#define _XOPEN_SOURCE
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
int measure(char *string) {
// allocate enough memory to hold the wide string
size_t needed = mbstowcs(NULL, string, 0) + 1;
wchar_t *wcstring = malloc(needed * sizeof *wcstring);
if (!wcstring) return -1;
// change encodings
if (mbstowcs(wcstring, string, needed) == (size_t)-1) return -2;
// measure width
int width = wcswidth(wcstring, needed);
free(wcstring);
return width;
}
int main(int argc, char **argv) {
setlocale(LC_ALL, "");
for (int i = 1; i < argc; i++) {
printf("%s: %d\n", argv[i], measure(argv[i]));
}
}
Here's an example of it running:
$ ./measure hello 莊子 cAb
hello: 5
莊子: 4
cAb: 4
Note how the two characters "莊子" and the three characters "cAb" (note the double-width A) are both 4 columns wide.
As utf8everywhere.org puts it,
The size of the string as it appears on the screen is unrelated to the
number of code points in the string. One has to communicate with the
rendering engine for this. Code points do not occupy one column even
in monospace fonts and terminals. POSIX takes this into account.
Windows does not have any built-in wcwidth function for console output; if you want to support multi-column characters in the Windows console you need to find a portable implementation of wcwidth give up because the Windows console doesn’t support Unicode without crazy hacks.

If you are able to use 3rd party libraries, have a look at the ICU library from IBM:
http://site.icu-project.org/

The following code takes ill-formed byte sequences into consideration. the example of string data comes from ""Table 3-8. Use of U+FFFD in UTF-8 Conversion"" in the Unicode Standard 6.3.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#define is_trail(c) (c > 0x7F && c < 0xC0)
#define SUCCESS 1
#define FAILURE -1
int utf8_get_next_char(const unsigned char*, size_t, size_t*, int*, unsigned int*);
int utf8_length(unsigned char*, size_t);
void utf8_print_each_char(unsigned char*, size_t);
int main(void)
{
unsigned char *str;
str = (unsigned char *) "\x61\xF1\x80\x80\xE1\x80\xC2\x62\x80\x63\x80\xBF\x64";
size_t str_size = strlen((const char*) str);
puts(10 == utf8_length(str, str_size) ? "true" : "false");
utf8_print_each_char(str, str_size);
return EXIT_SUCCESS;
}
int utf8_length(unsigned char *str, size_t str_size)
{
int length = 0;
size_t pos = 0;
size_t next_pos = 0;
int is_valid = 0;
unsigned int code_point = 0;
while (
utf8_get_next_char(str, str_size, &next_pos, &is_valid, &code_point) == SUCCESS
) {
++length;
}
return length;
}
void utf8_print_each_char(unsigned char *str, size_t str_size)
{
int length = 0;
size_t pos = 0;
size_t next_pos = 0;
int is_valid = 0;
unsigned int code_point = 0;
while (
utf8_get_next_char(str, str_size, &next_pos, &is_valid, &code_point) == SUCCESS
) {
if (is_valid == true) {
printf("%.*s\n", (int) next_pos - (int) pos, str + pos);
} else {
puts("\xEF\xBF\xBD");
}
pos = next_pos;
}
}
int utf8_get_next_char(const unsigned char *str, size_t str_size, size_t *cursor, int *is_valid, unsigned int *code_point)
{
size_t pos = *cursor;
size_t rest_size = str_size - pos;
unsigned char c;
unsigned char min;
unsigned char max;
*code_point = 0;
*is_valid = SUCCESS;
if (*cursor >= str_size) {
return FAILURE;
}
c = str[pos];
if (rest_size < 1) {
*is_valid = false;
pos += 1;
} else if (c < 0x80) {
*code_point = str[pos];
*is_valid = true;
pos += 1;
} else if (c < 0xC2) {
*is_valid = false;
pos += 1;
} else if (c < 0xE0) {
if (rest_size < 2 || !is_trail(str[pos + 1])) {
*is_valid = false;
pos += 1;
} else {
*code_point = ((str[pos] & 0x1F) << 6) | (str[pos + 1] & 0x3F);
*is_valid = true;
pos += 2;
}
} else if (c < 0xF0) {
min = (c == 0xE0) ? 0xA0 : 0x80;
max = (c == 0xED) ? 0x9F : 0xBF;
if (rest_size < 2 || str[pos + 1] < min || max < str[pos + 1]) {
*is_valid = false;
pos += 1;
} else if (rest_size < 3 || !is_trail(str[pos + 2])) {
*is_valid = false;
pos += 2;
} else {
*code_point = ((str[pos] & 0x1F) << 12)
| ((str[pos + 1] & 0x3F) << 6)
| (str[pos + 2] & 0x3F);
*is_valid = true;
pos += 3;
}
} else if (c < 0xF5) {
min = (c == 0xF0) ? 0x90 : 0x80;
max = (c == 0xF4) ? 0x8F : 0xBF;
if (rest_size < 2 || str[pos + 1] < min || max < str[pos + 1]) {
*is_valid = false;
pos += 1;
} else if (rest_size < 3 || !is_trail(str[pos + 2])) {
*is_valid = false;
pos += 2;
} else if (rest_size < 4 || !is_trail(str[pos + 3])) {
*is_valid = false;
pos += 3;
} else {
*code_point = ((str[pos] & 0x7) << 18)
| ((str[pos + 1] & 0x3F) << 12)
| ((str[pos + 2] & 0x3F) << 6)
| (str[pos + 3] & 0x3F);
*is_valid = true;
pos += 4;
}
} else {
*is_valid = false;
pos += 1;
}
*cursor = pos;
return SUCCESS;
}
When I write code for UTF-8, I see "Table 3-7. Well-Formed UTF-8 Byte Sequences" in the Unicode Standard 6.3.
Code Points First Byte Second Byte Third Byte Fourth Byte
U+0000 - U+007F 00 - 7F
U+0080 - U+07FF C2 - DF 80 - BF
U+0800 - U+0FFF E0 A0 - BF 80 - BF
U+1000 - U+CFFF E1 - EC 80 - BF 80 - BF
U+D000 - U+D7FF ED 80 - 9F 80 - BF
U+E000 - U+FFFF EE - EF 80 - BF 80 - BF
U+10000 - U+3FFFF F0 90 - BF 80 - BF 80 - BF
U+40000 - U+FFFFF F1 - F3 80 - BF 80 - BF 80 - BF
U+100000 - U+10FFFF F4 80 - 8F 80 - BF 80 - BF

You can also use glib which makes your live much easier when dealing with UTF-8. glib reference docs

Related

Hamming code check parity

I am not sure if I am calculating the parity bit correctly for the the check Parity bit function I wrote. The codeWord is 11 chars long with 4 parity bits and 7 data bits. Does the implementation look good?
void parityCheck(char* codeWord) {
int parity[4] = {0}, i = 0, diffParity[4] = {0}, twoPower = 0, bitSum = 0;
// Stores # of 1's for each parity bit in array.
parity[0] = (codeWord[2] - 48) + (codeWord[4] - 48) + (codeWord[6] - 48) + (codeWord[8] - 48) + (codeWord[10] - 48);
parity[1] = (codeWord[2] - 48) + (codeWord[5] - 48) + (codeWord[6] - 48) + (codeWord[9] - 48) + (codeWord[10] - 48);
parity[2] = (codeWord[4] - 48) + (codeWord[5] - 48) + (codeWord[6] - 48);
parity[3] = (codeWord[8] - 48) + (codeWord[9] - 48) + (codeWord[10] - 48);
// Determines if sum of bits is even or odd, then tests for difference from actual parity bit.
for (i = 0; i < 4; i++) {
twoPower = (int)pow((double)2, i);
if (parity[i] % 2 == 0)
parity[i] = 0;
else
parity[i] = 1;
if ((codeWord[twoPower-1] - 48) != parity[i])
diffParity[i] = 1;
}
// Calculates the location of the error bit.
for (i = 0; i < 4; i++) {
twoPower = (int)pow((double)2, i);
bitSum += diffParity[i]*twoPower;
}
// Inverts bit at location of error.
if (bitSum <= 11 && bitSum > 0) {
if ((codeWord[bitSum-1] - 48))
codeWord[bitSum-1] = '0';
else
codeWord[bitSum-1] = '1';
}
Does the implementation look good?
This very much depends on your measure for “good”. I can confirm that it does get the job done, so at least it is correct. Your code is very verbose, and thus hard to check for correctness. I'd do the following:
int parity_check(int codeWord) {
int parity = 0, codeWordBit, bitPos;
for (bitPos = 1; bitPos <= 11; ++bitPos) {
codeWordBit = ((codeWord >> (bitPos - 1)) & 1);
parity ^= bitPos*codeWordBit;
}
if (parity != 0) {
if (parity > 11)
return -1; // multi-bit error!
codeWord ^= 1 << (parity - 1);
}
return codeWord;
}
Instead of a sequence of digit characters, I treat your whole code word as a single integer, which is a lot more efficient.
Looking at the table at Wikipedia, I see that the columns of that table form binary representations of the sequence 1 … 11. Each code word bit affects exactly those parity bits mentioned in that column, so I take the code word bit (which is zero or one), multiply it by the bit pattern of that column to obtain either that pattern or zero, then XOR this with the current parity bit pattern. The effect of this is that a zero code word bit won't change anything, whereas a non-zero code word bit flips all associated parity bits.
Some care has to be taken because the bit pattern is one-based, whereas the bit position using the right shift trick is zero-based. So I have to subtract one, then shift right by that amount, and then extract the least significant digit in order to obtain the codeWordBit.
Using my implementation for reference, I was able to verify (by complete enumeration) that your code works the same.
Your code works fine AFAIK as it passed test cases I conjured up. Some simplifications were employed, but the OP functionality not changed. Some classic simplifications were made for easier viewing.
void parityCheck(char* cW) {
int parity[4] = { 0 }, i = 0, diffParity[4] = { 0 }, twoPower = 0, bitSum = 0;
// Stores # of 1's for each parity bit in array.
parity[0] = (cW[2] - '0') + (cW[4] - '0') + (cW[6] - '0') + (cW[8] - '0') + (cW[10] - '0');
parity[1] = (cW[2] - '0') + (cW[5] - '0') + (cW[6] - '0') + (cW[9] - '0') + (cW[10] - '0');
parity[2] = (cW[4] - '0') + (cW[5] - '0') + (cW[6] - '0');
parity[3] = (cW[8] - '0') + (cW[9] - '0') + (cW[10] - '0');
// Determines if sum of bits is even or odd, then tests for difference from actual parity bit.
for (i = 0; i < 4; i++) {
//twoPower = (int) pow((double) 2, i);
twoPower = 1 << i;
//if (parity[i] % 2 == 0) parity[i] = 0; else parity[i] = 1;
parity[i] &= 1; // Make 0 even, 1 odd.
if ((cW[twoPower - 1]-'0') != parity[i])
diffParity[i] = 1;
}
// Calculates the location of the error bit.
for (i = 0; i < 4; i++) {
// twoPower = (int) pow((double) 2, i);
twoPower = 1 << i;
bitSum += diffParity[i] * twoPower;
}
// Inverts bit at location of error.
if (bitSum <= 11 && bitSum > 0) {
if ((cW[bitSum - 1]-'0'))
cW[bitSum - 1] = '0';
else
cW[bitSum - 1] = '1';
}
}
void TestP(const char * Test) {
char buf[100];
strcpy(buf, Test);
parityCheck(buf);
printf("'%s' '%s'\n", Test, buf);
}
int main(void) {
TestP("00000000000");
TestP("10011100101");
TestP("10100111001");
}
It would have been useful had the OP posted test patterns.
Here's my implementation. It works. The public is free to use it at no charge.
I used the acronym "secded" as in, "single-error-correcting, double-error-detecting." You can re-wire this as a "triple error detector" if you want that instead. Really, some small part of this is secded and the rest is Hamming 7,4 -- but I named these methods what I did, when I did.
The "strings" here are not NUL-terminated, but counted. This code is excerpted from a Python module written in C. That is the provenance of the string type you see.
A key point here was realizing that there are only 16 Hamming 7,4 codes. I calculated secded_of_nibble() with some Python code, which unfortunately I no longer have.
static const unsigned char secded_of_nibble[] =
{ 0x0, 0xd2, 0x55, 0x87, 0x99, 0x4b, 0xcc, 0x1e, 0xe1, 0x33, 0xb4, 0x66, 0x78, 0
xaa, 0x2d, 0xff };
int fec_secded_encode_cch_bits(const char * strIn, const int cchIn, char * strOu
t, const int cchOut)
{
assert( cchIn * 2 == cchOut);
if( cchIn * 2 != cchOut)
return 0;
if (!strIn || !strOut)
return 0;
int i;
for (i = 0; i < cchIn; i ++)
{
char in_byte = strIn[i];
char hi_byte = secded_of_nibble[(in_byte >> 4) & 0xf];
char lo_byte = secded_of_nibble[in_byte & 0xf];
strOut[i * 2] = hi_byte;
strOut[i * 2 + 1] = lo_byte;
}
return 1;
}
char bv_H[] = {0x9, 0xA, 0xB, 0xC, 0xD, 0xE, 0xF, 0x8};
char val_nibble(char ch)
{
return ((ch & 0x20) >> 2) | ((ch & 0xE) >> 1);
}
char correct_nibble(char ch)
{
char nibble = 0;
int i = 0;
for (i = 0; i < 8; i++)
if (ch & (1 << (7-i)))
nibble ^= bv_H[i];
return nibble;
}
void apply_correct(char nib_correct, char * pbyte, int * pcSec, int *pcDed)
{
if (0 == nib_correct)
return;
if (nib_correct & 0x8)
{
(*pcSec) ++;
int bit = (8 - (nib_correct & 0x7)) & 0x7;
/* fprintf(stderr, "bit %d, %02X\n", bit, 1 << bit);*/
(*pbyte) ^= (1 << bit);
}
else
{
(*pcDed) ++;
}
}
int fec_secded_decode_cch_bits
(
const char * strIn,
const int cchIn,
char * strOut,
const int cchOut,
int * pcSec,
int * pcDed
)
{
assert( cchIn == cchOut *2);
if( cchIn != cchOut * 2)
return 0;
if (!strIn || !strOut)
return 0;
int i;
for (i = 0; i < cchOut; i ++)
{
char hi_byte = strIn[i * 2];
char lo_byte = strIn[i * 2 + 1];
char hi_correct = correct_nibble(hi_byte);
char lo_correct = correct_nibble(lo_byte);
if (hi_correct || lo_correct)
{
apply_correct(hi_correct, &hi_byte, pcSec, pcDed);
apply_correct(lo_correct, &lo_byte, pcSec, pcDed);
/* fprintf(stderr, "Corrections %x %x.\n", hi_correct, lo_correct);*/
}
char hi_nibble = val_nibble(hi_byte);
char lo_nibble = val_nibble(lo_byte);
strOut[i] = (hi_nibble << 4) | lo_nibble;
}
return 1;
}

Convert hex string (char []) to int?

I have a char[] that contains a value such as "0x1800785" but the function I want to give the value to requires an int, how can I convert this to an int? I have searched around but cannot find an answer. Thanks.
Have you tried strtol()?
strtol - convert string to a long integer
Example:
const char *hexstring = "abcdef0";
int number = (int)strtol(hexstring, NULL, 16);
In case the string representation of the number begins with a 0x prefix, one must should use 0 as base:
const char *hexstring = "0xabcdef0";
int number = (int)strtol(hexstring, NULL, 0);
(It's as well possible to specify an explicit base such as 16, but I wouldn't recommend introducing redundancy.)
Or if you want to have your own implementation, I wrote this quick function as an example:
/**
* hex2int
* take a hex string and convert it to a 32bit number (max 8 hex digits)
*/
uint32_t hex2int(char *hex) {
uint32_t val = 0;
while (*hex) {
// get current character then increment
uint8_t byte = *hex++;
// transform hex character to the 4bit equivalent number, using the ascii table indexes
if (byte >= '0' && byte <= '9') byte = byte - '0';
else if (byte >= 'a' && byte <='f') byte = byte - 'a' + 10;
else if (byte >= 'A' && byte <='F') byte = byte - 'A' + 10;
// shift 4 to make space for new digit, and add the 4 bits of the new digit
val = (val << 4) | (byte & 0xF);
}
return val;
}
Something like this could be useful:
char str[] = "0x1800785";
int num;
sscanf(str, "%x", &num);
printf("0x%x %i\n", num, num);
Read man sscanf
Assuming you mean it's a string, how about strtol?
Use strtol if you have libc available like the top answer suggests. However if you like custom stuff or are on a microcontroller without libc or so, you may want a slightly optimized version without complex branching.
#include <inttypes.h>
/**
* xtou64
* Take a hex string and convert it to a 64bit number (max 16 hex digits).
* The string must only contain digits and valid hex characters.
*/
uint64_t xtou64(const char *str)
{
uint64_t res = 0;
char c;
while ((c = *str++)) {
char v = (c & 0xF) + (c >> 6) | ((c >> 3) & 0x8);
res = (res << 4) | (uint64_t) v;
}
return res;
}
The bit shifting magic boils down to: Just use the last 4 bits, but if it is an non digit, then also add 9.
One quick & dirty solution:
// makes a number from two ascii hexa characters
int ahex2int(char a, char b){
a = (a <= '9') ? a - '0' : (a & 0x7) + 9;
b = (b <= '9') ? b - '0' : (b & 0x7) + 9;
return (a << 4) + b;
}
You have to be sure your input is correct, no validation included (one could say it is C). Good thing it is quite compact, it works with both 'A' to 'F' and 'a' to 'f'.
The approach relies on the position of alphabet characters in the ASCII table, let's peek e.g. to Wikipedia (https://en.wikipedia.org/wiki/ASCII#/media/File:USASCII_code_chart.png). Long story short, the numbers are below the characters, so the numeric characters (0 to 9) are easily converted by subtracting the code for zero. The alphabetic characters (A to F) are read by zeroing other than last three bits (effectively making it work with either upper- or lowercase), subtracting one (because after the bit masking, the alphabet starts on position one) and adding ten (because A to F represent 10th to 15th value in hexadecimal code). Finally, we need to combine the two digits that form the lower and upper nibble of the encoded number.
Here we go with same approach (with minor variations):
#include <stdio.h>
// takes a null-terminated string of hexa characters and tries to
// convert it to numbers
long ahex2num(unsigned char *in){
unsigned char *pin = in; // lets use pointer to loop through the string
long out = 0; // here we accumulate the result
while(*pin != 0){
out <<= 4; // we have one more input character, so
// we shift the accumulated interim-result one order up
out += (*pin < 'A') ? *pin & 0xF : (*pin & 0x7) + 9; // add the new nibble
pin++; // go ahead
}
return out;
}
// main function will test our conversion fn
int main(void) {
unsigned char str[] = "1800785"; // no 0x prefix, please
long num;
num = ahex2num(str); // call the function
printf("Input: %s\n",str); // print input string
printf("Output: %x\n",num); // print the converted number back as hexa
printf("Check: %ld = %ld \n",num,0x1800785); // check the numeric values matches
return 0;
}
Try below block of code, its working for me.
char p[] = "0x820";
uint16_t intVal;
sscanf(p, "%x", &intVal);
printf("value x: %x - %d", intVal, intVal);
Output is:
value x: 820 - 2080
So, after a while of searching, and finding out that strtol is quite slow, I've coded my own function. It only works for uppercase on letters, but adding lowercase functionality ain't a problem.
int hexToInt(PCHAR _hex, int offset = 0, int size = 6)
{
int _result = 0;
DWORD _resultPtr = reinterpret_cast<DWORD>(&_result);
for(int i=0;i<size;i+=2)
{
int _multiplierFirstValue = 0, _addonSecondValue = 0;
char _firstChar = _hex[offset + i];
if(_firstChar >= 0x30 && _firstChar <= 0x39)
_multiplierFirstValue = _firstChar - 0x30;
else if(_firstChar >= 0x41 && _firstChar <= 0x46)
_multiplierFirstValue = 10 + (_firstChar - 0x41);
char _secndChar = _hex[offset + i + 1];
if(_secndChar >= 0x30 && _secndChar <= 0x39)
_addonSecondValue = _secndChar - 0x30;
else if(_secndChar >= 0x41 && _secndChar <= 0x46)
_addonSecondValue = 10 + (_secndChar - 0x41);
*(BYTE *)(_resultPtr + (size / 2) - (i / 2) - 1) = (BYTE)(_multiplierFirstValue * 16 + _addonSecondValue);
}
return _result;
}
Usage:
char *someHex = "#CCFF00FF";
int hexDevalue = hexToInt(someHex, 1, 8);
1 because the hex we want to convert starts at offset 1, and 8 because it's the hex length.
Speedtest (1.000.000 calls):
strtol ~ 0.4400s
hexToInt ~ 0.1100s
This is a function to directly convert hexadecimal containing char array to an integer which needs no extra library:
int hexadecimal2int(char *hdec) {
int finalval = 0;
while (*hdec) {
int onebyte = *hdec++;
if (onebyte >= '0' && onebyte <= '9'){onebyte = onebyte - '0';}
else if (onebyte >= 'a' && onebyte <='f') {onebyte = onebyte - 'a' + 10;}
else if (onebyte >= 'A' && onebyte <='F') {onebyte = onebyte - 'A' + 10;}
finalval = (finalval << 4) | (onebyte & 0xF);
}
finalval = finalval - 524288;
return finalval;
}
I have done a similar thing before and I think this might help you.
The following works for me:
int main(){
int co[8];
char ch[8];
printf("please enter the string:");
scanf("%s", ch);
for (int i=0; i<=7; i++) {
if ((ch[i]>='A') && (ch[i]<='F')) {
co[i] = (unsigned int) ch[i]-'A'+10;
} else if ((ch[i]>='0') && (ch[i]<='9')) {
co[i] = (unsigned int) ch[i]-'0'+0;
}
}
Here, I have only taken a string of 8 characters.
If you want you can add similar logic for 'a' to 'f' to give their equivalent hex values. Though, I haven't done that because I didn't need it.
I made a librairy to make Hexadecimal / Decimal conversion without the use of stdio.h. Very simple to use :
unsigned hexdec (const char *hex, const int s_hex);
Before the first conversion intialize the array used for conversion with :
void init_hexdec ();
Here the link on github : https://github.com/kevmuret/libhex/
I like #radhoo solution, very efficient on small systems. One can modify the solution for converting the hex to int32_t (hence, signed value).
/**
* hex2int
* take a hex string and convert it to a 32bit number (max 8 hex digits)
*/
int32_t hex2int(char *hex) {
uint32_t val = *hex > 56 ? 0xFFFFFFFF : 0;
while (*hex) {
// get current character then increment
uint8_t byte = *hex++;
// transform hex character to the 4bit equivalent number, using the ascii table indexes
if (byte >= '0' && byte <= '9') byte = byte - '0';
else if (byte >= 'a' && byte <='f') byte = byte - 'a' + 10;
else if (byte >= 'A' && byte <='F') byte = byte - 'A' + 10;
// shift 4 to make space for new digit, and add the 4 bits of the new digit
val = (val << 4) | (byte & 0xF);
}
return val;
}
Note the return value is int32_t while val is still uint32_t to not overflow.
The
uint32_t val = *hex > 56 ? 0xFFFFFFFF : 0;
is not protected against malformed string.
Here is a solution building upon "sairam singh"s solution. Where that answer is a one to one solution, this one combines two ASCII nibbles into one byte.
// Assumes input is null terminated string.
//
// IN OUT
// -------------------- --------------------
// Offset Hex ASCII Offset Hex
// 0 0x31 1 0 0x13
// 1 0x33 3
// 2 0x61 A 1 0xA0
// 3 0x30 0
// 4 0x00 NULL 2 NULL
int convert_ascii_hex_to_hex2(char *szBufOut, char *szBufIn) {
int i = 0; // input buffer index
int j = 0; // output buffer index
char a_byte;
// Two hex digits are combined into one byte
while (0 != szBufIn[i]) {
// zero result
szBufOut[j] = 0;
// First hex digit
if ((szBufIn[i]>='A') && (szBufIn[i]<='F')) {
a_byte = (unsigned int) szBufIn[i]-'A'+10;
} else if ((szBufIn[i]>='a') && (szBufIn[i]<='f')) {
a_byte = (unsigned int) szBufIn[i]-'a'+10;
} else if ((szBufIn[i]>='0') && (szBufIn[i]<='9')) {
a_byte = (unsigned int) szBufIn[i]-'0';
} else {
return -1; // error with first digit
}
szBufOut[j] = a_byte << 4;
// second hex digit
i++;
if ((szBufIn[i]>='A') && (szBufIn[i]<='F')) {
a_byte = (unsigned int) szBufIn[i]-'A'+10;
} else if ((szBufIn[i]>='a') && (szBufIn[i]<='f')) {
a_byte = (unsigned int) szBufIn[i]-'a'+10;
} else if ((szBufIn[i]>='0') && (szBufIn[i]<='9')) {
a_byte = (unsigned int) szBufIn[i]-'0';
} else {
return -2; // error with second digit
}
szBufOut[j] |= a_byte;
i++;
j++;
}
szBufOut[j] = 0;
return 0; // normal exit
}
I know this is really old but I think the solutions looked too complicated. Try this in VB:
Public Function HexToInt(sHEX as String) as long
Dim iLen as Integer
Dim i as Integer
Dim SumValue as Long
Dim iVal as long
Dim AscVal as long
iLen = Len(sHEX)
For i = 1 to Len(sHEX)
AscVal = Asc(UCase(Mid$(sHEX, i, 1)))
If AscVal >= 48 And AscVal <= 57 Then
iVal = AscVal - 48
ElseIf AscVal >= 65 And AscVal <= 70 Then
iVal = AscVal - 55
End If
SumValue = SumValue + iVal * 16 ^ (iLen- i)
Next i
HexToInt = SumValue
End Function

Convert two ASCII Hexadecimal Characters (Two ASCII bytes) in one byte

I want to convert two ASCII bytes to one hexadecimal byte.
eg.
0x30 0x43 => 0x0C , 0x34 0x46 => 0x4F ...
The ASCII bytes are a number between 0 and 9 or a letter between A and F (upper case only), so between 0x30 ... 0x39 and 0x41 ... 0x46
I know how "to construct" 0x4F with the numbers 0x34 and 0x46 : 0x4F = 0x34 * 0x10 + 0x46
So, in fact, i would to convert one ASCII byte in hexadecimal value.
For that, i can build and array to assign the hexadecimal value to the ASCII char :
0x30 => 0x00
0x31 => 0x01
...
0x46 => 0x0F
But, maybe it have a most « proper » solution.
The program will be run on an AVR µC and is compiled with avr-gcc, so scanf() / printf() solutions aren't suitable.
Have you got an idea ?
Thanks
i can't make sense of your examples, but if you want to convert a string containing hexadecimal ascii characters to its byte value (e.g. so the string "56" becomes the byte 0x56, you can use this (which assumes your system is using ASCII)
uint8_t*
hex_decode(const char *in, size_t len,uint8_t *out)
{
unsigned int i, t, hn, ln;
for (t = 0,i = 0; i < len; i+=2,++t) {
hn = in[i] > '9' ? in[i] - 'A' + 10 : in[i] - '0';
ln = in[i+1] > '9' ? in[i+1] - 'A' + 10 : in[i+1] - '0';
out[t] = (hn << 4 ) | ln;
}
return out;
}
You'd use it like e.g.
char x[]="1234";
uint8_t res[2];
hex_decode(x,strlen(x),res);
And res (which must be at least half the length of the in parameter) now contains the 2 bytes 0x12,0x34
Note also that this code needs the hexadecimal letters A-F to be capital, a-f won't do (and it doesn't do any error checking - so you'll have to pass it valid stuff).
You can use strtol(), which is part of avr-libc, or you can write just your specific case pretty easily:
unsigned char charToHexDigit(char c)
{
if (c >= 'A')
return c - 'A' + 10;
else
return c - '0';
}
unsigned char stringToByte(char c[2])
{
return charToHexDigit(c[0]) * 16 + charToHexDigit(c[1]);
}
The task:
Convert a string containing hexadecimal ascii characters to its byte values
so ascii "FF" becomes 0xFF and ascii "10" (x31x30x00) becomes 0x10
char asciiString[]="aaAA12fF";// input ascii hex string
char result[4]; // byte equivalent of the asciiString (the size should be at half of asciiString[])
// the final result should be:
result[0] = 0xAA;
result[1] = 0xAA;
result[2] = 0x12;
result[3] = 0xFF;
//1. Firt step: convert asciiString so it contains upper cases only:
// convert string to upper cases:
stringToUpperCases(asciiString);
use:
void stringToUpperCases(char *p)
{
for(int i=0; *(p+i) !='\0'; i++)
{
*(p+i) = (unsigned char) toupper( *(p+i) );
}
}
//2. Convert a string containing hexadecimal ascii characters to its byte values:
// convert string to bytes:
int nrOfBytes = stringToBytes(asciiString,result);
//use:
unsigned char charToHexDigit(char c)
{
if (c >= 'A')
return (c - 'A' + 10);
else
return (c - '0');
}
unsigned char ascii2HexToByte(char *ptr)
{
return charToHexDigit( *ptr )*16 + charToHexDigit( *(ptr+1) );
}
int stringToBytes(char *string, char *result)
{
int k=0;
int strLen = strlen(string);
for(int i = 0; i < strLen; i = i + 2)
{
result[k] = ascii2HexToByte( &string[i] );
k++;
}
return k; // number of bytes in the result array
}
//3. print result:
printNrOfBytes(nrOfBytes, result);
// use:
void printNrOfBytes(int nr, char *p)
{
for(int i= 0; i < nr; i++)
{
printf( "0x%02X ", (unsigned char)*(p+i) );
}
printf( "\n");
}
//4. The result should be:
0xAA 0xAA 0x12 0xFF
//5. This is the test program:
char asciiString[]="aaAA12fF"; // input ascii hex string
char result[4]; // result
// convert string to upper cases:
stringToUpperCases(asciiString);
// convert string to bytes
int nrOfBytes = stringToBytes(asciiString,result);
// print result:
printNrOfBytes(nrOfBytes, result);
// result:
// 0xAA 0xAA 0x12 0xFF
It's works, but could be much optimized !
inline uint8_t twoAsciiByteToByte(const std::string& s)
{
uint8_t r = 0;
if (s.length() == 4)
{
uint8_t a = asciiToByte(s[0]);
uint8_t b = asciiToByte(s[1]);
uint8_t c = asciiToByte(s[2]);
uint8_t d = asciiToByte(s[3]);
int h = (a * 10 + b);
int l = (c * 10 + d);
if (s[0] == '3')
h -= 30;
else if (s[0] == '4')
h -= 31;
if (s[2] == '3')
l -= 30;
else if (s[2] == '4')
l -= 31;
r = (h << 4) | l;
}
return r;
}
Here's a version that works with both upper and lower-case hex strings:
void hex_decode(const char *in, size_t len, uint8_t *out)
{
unsigned int i, hn, ln;
char hc, lc;
memset(out, 0, len);
for (i = 0; i < 2*len; i += 2) {
hc = in[i];
if ('a' <= hc && hc <= 'f') hc = toupper(hc);
lc = in[i+1];
if ('a' <= lc && lc <= 'f') lc = toupper(lc);
hn = hc > '9' ? hc - 'A' + 10 : hc - '0';
ln = lc > '9' ? lc - 'A' + 10 : lc - '0';
out[i >> 1] = (hn << 4 ) | ln;
}
}
Converting 2 hex chars to a byte is done in two steps:
Convert char a and b to their number (e.g. 'F' -> 0xF), which is done in two big if else branches, that check if the char is in the range '0' to '9', 'A' to 'F' or 'a' to 'f'.
In the 2nd step the two numbers are joined by shifting a (largest value is 0xF (0b0000_FFFF)) 4 to the left (a << 4 -> 0b1111_0000) and then apply the bitwise or operation on a and b ((a << 4) | b):
a: 0000_1111
b: 1111_0000
-> 1111_1111
#include <stdio.h>
#include <stdint.h>
#define u8 uint8_t
#define u32 uint32_t
u8 to_hex_digit(char a, char b) {
u8 result = 0;
if (a >= 0x30 && a <= 0x39) {
result = (a - 0x30) << 4;
} else if (a >= 0x41 && a <= 0x46) {
result = (a - 0x41 + 10) << 4;
} else if (a >= 0x61 && a <= 0x7A) {
result = (a - 0x61 + 10) << 4;
} else {
printf("invalid hex digit: '%c'\n", a);
}
if (b >= 0x30 && b <= 0x39) {
result |= b - 0x30;
} else if (b >= 0x41 && b <= 0x46) {
result |= b - 0x41 + 10;
} else if (b >= 0x61 && b <= 0x7A) {
result |= b - 0x61 + 10;
} else {
printf("invalid hex digit: '%c'\n", b);
}
return result;
}
u32 main() {
u8 result = to_hex_digit('F', 'F');
printf("0x%X (%d)\n", result, result);
return 0;
}

What is a better method for packing 4 bytes into 3 than this?

I have an array of values all well within the range 0 - 63, and decided I could pack every 4 bytes into 3 because the values only require 6 bits and I could use the extra 2bits to store the first 2 bits of the next value and so on.
Having never done this before I used the switch statement and a nextbit variable (a state machine like device) to do the packing and keep track of the starting bit. I'm convinced however, there must be a better way.
Suggestions/clues please, but don't ruin my fun ;-)
Any portability problems regarding big/little endian?
btw: I have verified this code is working, by unpacking it again and comparing with the input. And no it ain't homework, just an exercise I've set myself.
/* build with gcc -std=c99 -Wconversion */
#define ASZ 400
typedef unsigned char uc_;
uc_ data[ASZ];
int i;
for (i = 0; i < ASZ; ++i) {
data[i] = (uc_)(i % 0x40);
}
size_t dl = sizeof(data);
printf("sizeof(data):%z\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%z\n",pl);
for (i = 0; i < dl; ++i)
printf("%02d ", data[i]);
printf("\n");
uc_ * packeddata = calloc(pl, sizeof(uc_));
uc_ * byte = packeddata;
uc_ nextbit = 1;
for (int i = 0; i < dl; ++i) {
uc_ m = (uc_)(data[i] & 0x3f);
switch(nextbit) {
case 1:
/* all 6 bits of m into first 6 bits of byte: */
*byte = m;
nextbit = 7;
break;
case 3:
/* all 6 bits of m into last 6 bits of byte: */
*byte++ = (uc_)(*byte | (m << 2));
nextbit = 1;
break;
case 5:
/* 1st 4 bits of m into last 4 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x0f) << 4));
/* 5th and 6th bits of m into 1st and 2nd bits of byte: */
*byte = (uc_)(*byte | ((m & 0x30) >> 4));
nextbit = 3;
break;
case 7:
/* 1st 2 bits of m into last 2 bits of byte: */
*byte++ = (uc_)(*byte | ((m & 0x03) << 6));
/* next (last) 4 bits of m into 1st 4 bits of byte: */
*byte = (uc_)((m & 0x3c) >> 2);
nextbit = 5;
break;
}
}
So, this is kinda like code-golf, right?
#include <stdlib.h>
#include <string.h>
static void pack2(unsigned char *r, unsigned char *n) {
unsigned v = n[0] + (n[1] << 6) + (n[2] << 12) + (n[3] << 18);
*r++ = v;
*r++ = v >> 8;
*r++ = v >> 16;
}
unsigned char *apack(const unsigned char *s, int len) {
unsigned char *s_end = s + len,
*r, *result = malloc(len/4*3+3),
lastones[4] = { 0 };
if (result == NULL)
return NULL;
for(r = result; s + 4 <= s_end; s += 4, r += 3)
pack2(r, s);
memcpy(lastones, s, s_end - s);
pack2(r, lastones);
return result;
}
Check out the IETF RFC 4648 for 'The Base16, Base32 and Base64 Data Encodings'.
Partial code critique:
size_t dl = sizeof(data);
printf("sizeof(data):%d\n",dl);
float fpl = ((float)dl / 4.0f) * 3.0f;
size_t pl = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
printf("length of packed data:%d\n",pl);
Don't use the floating point stuff - just use integers. And use '%z' to print 'size_t' values - assuming you've got a C99 library.
size_t pl = ((dl + 3) / 4) * 3;
I think your loop could be simplified by dealing with 3-byte input units until you've got a partial unit left over, and then dealing with a remainder of 1 or 2 bytes as special cases. I note that the standard referenced says that you use one or two '=' signs to pad at the end.
I have a Base64 encoder and decode which does some of that. You are describing the 'decode' part of Base64 -- where the Base64 code has 4 bytes of data that should be stored in just 3 - as your packing code. The Base64 encoder corresponds to the unpacker you will need.
Base-64 Decoder
Note: base_64_inv is an array of 256 values, one for each possible input byte value; it defines the correct decoded value for each encoded byte. In the Base64 encoding, this is a sparse array - 3/4 zeroes. Similarly, base_64_map is the mapping between a value 0..63 and the corresponding storage value.
enum { DC_PAD = -1, DC_ERR = -2 };
static int decode_b64(int c)
{
int b64 = base_64_inv[c];
if (c == base64_pad)
b64 = DC_PAD;
else if (b64 == 0 && c != base_64_map[0])
b64 = DC_ERR;
return(b64);
}
/* Decode 4 bytes into 3 */
static int decode_quad(const char *b64_data, char *bin_data)
{
int b0 = decode_b64(b64_data[0]);
int b1 = decode_b64(b64_data[1]);
int b2 = decode_b64(b64_data[2]);
int b3 = decode_b64(b64_data[3]);
int bytes;
if (b0 < 0 || b1 < 0 || b2 == DC_ERR || b3 == DC_ERR || (b2 == DC_PAD && b3 != DC_PAD))
return(B64_ERR_INVALID_ENCODED_DATA);
if (b2 == DC_PAD && (b1 & 0x0F) != 0)
/* 3rd byte is '='; 2nd byte must end with 4 zero bits */
return(B64_ERR_INVALID_TRAILING_BYTE);
if (b2 >= 0 && b3 == DC_PAD && (b2 & 0x03) != 0)
/* 4th byte is '='; 3rd byte is not '=' and must end with 2 zero bits */
return(B64_ERR_INVALID_TRAILING_BYTE);
bin_data[0] = (b0 << 2) | (b1 >> 4);
bytes = 1;
if (b2 >= 0)
{
bin_data[1] = ((b1 & 0x0F) << 4) | (b2 >> 2);
bytes = 2;
}
if (b3 >= 0)
{
bin_data[2] = ((b2 & 0x03) << 6) | (b3);
bytes = 3;
}
return(bytes);
}
/* Decode input Base-64 string to original data. Output length returned, or negative error */
int base64_decode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
size_t outlen = 0;
if (datalen % 4 != 0)
return(B64_ERR_INVALID_ENCODED_LENGTH);
if (BASE64_DECLENGTH(datalen) > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 4)
{
int nbytes = decode_quad(data, buffer + outlen);
if (nbytes < 0)
return(nbytes);
outlen += nbytes;
data += 4;
datalen -= 4;
}
assert(datalen == 0); /* By virtue of the %4 check earlier */
return(outlen);
}
Base-64 Encoder
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
I complicate life by having to deal with a product that uses a variant alphabet for the Base64 encoding, and also manages not to pad data - hence the 'pad' argument (which can be zero for 'null padding' or '=' for standard padding. The 'base_64_map' array contains the alphabet to use for 6-bit values in the range 0..63.
Another simpler way to do it would be to use bit fields. One of the lesser known corners of C struct syntax is the big field. Let's say you have the following structure:
struct packed_bytes {
byte chunk1 : 6;
byte chunk2 : 6;
byte chunk3 : 6;
byte chunk4 : 6;
};
This declares chunk1, chunk2, chunk3, and chunk4 to have the type byte but to only take up 6 bits in the structure. The result is that sizeof(struct packed_bytes) == 3. Now all you need is a little function to take your array and dump it into the structure like so:
void
dump_to_struct(byte *in, struct packed_bytes *out, int count)
{
int i, j;
for (i = 0; i < (count / 4); ++i) {
out[i].chunk1 = in[i * 4];
out[i].chunk2 = in[i * 4 + 1];
out[i].chunk3 = in[i * 4 + 2];
out[i].chunk4 = in[i * 4 + 3];
}
// Finish up
switch(struct % 4) {
case 3:
out[count / 4].chunk3 = in[(count / 4) * 4 + 2];
case 2:
out[count / 4].chunk2 = in[(count / 4) * 4 + 1];
case 1:
out[count / 4].chunk1 = in[(count / 4) * 4];
}
}
There you go, you now have an array of struct packed_bytes that you can easily read by using the above struct.
Instead of using a statemachine you can simply use a counter for how many bits are already used in the current byte, from which you can directly derive the shift-offsets and whether or not you overflow into the next byte.
Regarding the endianess: As long as you use only a single datatype (that is you don't reinterpret pointer to types of different size (e.g. int* a =...;short* b=(short*) a;) you shouldn't get problems with endianess in most cases
Taking elements of DigitalRoss's compact code, Grizzly's suggestion, and my own code, I have written my own answer at last. Although DigitalRoss provides a usable working answer, my usage of it without understanding, would not have provided the same satisfaction as to learning something. For this reason I have chosen to base my answer on my original code.
I have also chosen to ignore the advice Jonathon Leffler gives to avoid using floating point arithmetic for the calculation of the packed data length. Both the recommended method given - the same DigitalRoss also uses, increases the length of the packed data by as much as three bytes. Granted this is not much, but is also avoidable by the use of floating point math.
Here is the code, criticisms welcome:
/* built with gcc -std=c99 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned char *
pack(const unsigned char * data, size_t len, size_t * packedlen)
{
float fpl = ((float)len / 4.0f) * 3.0f;
*packedlen = (size_t)(fpl > (float)((int)fpl) ? fpl + 1 : fpl);
unsigned char * packed = malloc(*packedlen);
if (!packed)
return 0;
const unsigned char * in = data;
const unsigned char * in_end = in + len;
unsigned char * out;
for (out = packed; in + 4 <= in_end; in += 4) {
*out++ = in[0] | ((in[1] & 0x03) << 6);
*out++ = ((in[1] & 0x3c) >> 2) | ((in[2] & 0x0f) << 4);
*out++ = ((in[2] & 0x30) >> 4) | (in[3] << 2);
}
size_t lastlen = in_end - in;
if (lastlen > 0) {
*out = in[0];
if (lastlen > 1) {
*out++ |= ((in[1] & 0x03) << 6);
*out = ((in[1] & 0x3c) >> 2);
if (lastlen > 2) {
*out++ |= ((in[2] & 0x0f) << 4);
*out = ((in[2] & 0x30) >> 4);
if (lastlen > 3)
*out |= (in[3] << 2);
}
}
}
return packed;
}
int main()
{
size_t i;
unsigned char data[] = {
12, 15, 40, 18,
26, 32, 50, 3,
7, 19, 46, 10,
25, 37, 2, 39,
60, 59, 0, 17,
9, 29, 13, 54,
5, 6, 47, 32
};
size_t datalen = sizeof(data);
printf("unpacked datalen: %td\nunpacked data\n", datalen);
for (i = 0; i < datalen; ++i)
printf("%02d ", data[i]);
printf("\n");
size_t packedlen;
unsigned char * packed = pack(data, sizeof(data), &packedlen);
if (!packed) {
fprintf(stderr, "Packing failed!\n");
return EXIT_FAILURE;
}
printf("packedlen: %td\npacked data\n", packedlen);
for (i = 0; i < packedlen; ++i)
printf("0x%02x ", packed[i]);
printf("\n");
free(packed);
return EXIT_SUCCESS;
}

Is there a printf converter to print in binary format?

I can print with printf as a hex or octal number. Is there a format tag to print as binary, or arbitrary base?
I am running gcc.
printf("%d %x %o\n", 10, 10, 10); //prints "10 A 12\n"
printf("%b\n", 10); // prints "%b\n"
Hacky but works for me:
#define BYTE_TO_BINARY_PATTERN "%c%c%c%c%c%c%c%c"
#define BYTE_TO_BINARY(byte) \
(byte & 0x80 ? '1' : '0'), \
(byte & 0x40 ? '1' : '0'), \
(byte & 0x20 ? '1' : '0'), \
(byte & 0x10 ? '1' : '0'), \
(byte & 0x08 ? '1' : '0'), \
(byte & 0x04 ? '1' : '0'), \
(byte & 0x02 ? '1' : '0'), \
(byte & 0x01 ? '1' : '0')
printf("Leading text "BYTE_TO_BINARY_PATTERN, BYTE_TO_BINARY(byte));
For multi-byte types
printf("m: "BYTE_TO_BINARY_PATTERN" "BYTE_TO_BINARY_PATTERN"\n",
BYTE_TO_BINARY(m>>8), BYTE_TO_BINARY(m));
You need all the extra quotes unfortunately. This approach has the efficiency risks of macros (don't pass a function as the argument to BYTE_TO_BINARY) but avoids the memory issues and multiple invocations of strcat in some of the other proposals here.
Print Binary for Any Datatype
// Assumes little endian
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
for (i = size-1; i >= 0; i--) {
for (j = 7; j >= 0; j--) {
byte = (b[i] >> j) & 1;
printf("%u", byte);
}
}
puts("");
}
Test:
int main(int argv, char* argc[])
{
int i = 23;
uint ui = UINT_MAX;
float f = 23.45f;
printBits(sizeof(i), &i);
printBits(sizeof(ui), &ui);
printBits(sizeof(f), &f);
return 0;
}
Here is a quick hack to demonstrate techniques to do what you want.
#include <stdio.h> /* printf */
#include <string.h> /* strcat */
#include <stdlib.h> /* strtol */
const char *byte_to_binary
(
int x
)
{
static char b[9];
b[0] = '\0';
int z;
for (z = 128; z > 0; z >>= 1)
{
strcat(b, ((x & z) == z) ? "1" : "0");
}
return b;
}
int main
(
void
)
{
{
/* binary string to int */
char *tmp;
char *b = "0101";
printf("%d\n", strtol(b, &tmp, 2));
}
{
/* byte to binary string */
printf("%s\n", byte_to_binary(5));
}
return 0;
}
There isn't a binary conversion specifier in glibc normally.
It is possible to add custom conversion types to the printf() family of functions in glibc. See register_printf_function for details. You could add a custom %b conversion for your own use, if it simplifies the application code to have it available.
Here is an example of how to implement a custom printf formats in glibc.
You could use a small table to improve speed1. Similar techniques are useful in the embedded world, for example, to invert a byte:
const char *bit_rep[16] = {
[ 0] = "0000", [ 1] = "0001", [ 2] = "0010", [ 3] = "0011",
[ 4] = "0100", [ 5] = "0101", [ 6] = "0110", [ 7] = "0111",
[ 8] = "1000", [ 9] = "1001", [10] = "1010", [11] = "1011",
[12] = "1100", [13] = "1101", [14] = "1110", [15] = "1111",
};
void print_byte(uint8_t byte)
{
printf("%s%s", bit_rep[byte >> 4], bit_rep[byte & 0x0F]);
}
1 I'm mostly referring to embedded applications where optimizers are not so aggressive and the speed difference is visible.
Print the least significant bit and shift it out on the right. Doing this until the integer becomes zero prints the binary representation without leading zeros but in reversed order. Using recursion, the order can be corrected quite easily.
#include <stdio.h>
void print_binary(unsigned int number)
{
if (number >> 1) {
print_binary(number >> 1);
}
putc((number & 1) ? '1' : '0', stdout);
}
To me, this is one of the cleanest solutions to the problem. If you like 0b prefix and a trailing new line character, I suggest wrapping the function.
Online demo
Based on #William Whyte's answer, this is a macro that provides int8,16,32 & 64 versions, reusing the INT8 macro to avoid repetition.
/* --- PRINTF_BYTE_TO_BINARY macro's --- */
#define PRINTF_BINARY_PATTERN_INT8 "%c%c%c%c%c%c%c%c"
#define PRINTF_BYTE_TO_BINARY_INT8(i) \
(((i) & 0x80ll) ? '1' : '0'), \
(((i) & 0x40ll) ? '1' : '0'), \
(((i) & 0x20ll) ? '1' : '0'), \
(((i) & 0x10ll) ? '1' : '0'), \
(((i) & 0x08ll) ? '1' : '0'), \
(((i) & 0x04ll) ? '1' : '0'), \
(((i) & 0x02ll) ? '1' : '0'), \
(((i) & 0x01ll) ? '1' : '0')
#define PRINTF_BINARY_PATTERN_INT16 \
PRINTF_BINARY_PATTERN_INT8 PRINTF_BINARY_PATTERN_INT8
#define PRINTF_BYTE_TO_BINARY_INT16(i) \
PRINTF_BYTE_TO_BINARY_INT8((i) >> 8), PRINTF_BYTE_TO_BINARY_INT8(i)
#define PRINTF_BINARY_PATTERN_INT32 \
PRINTF_BINARY_PATTERN_INT16 PRINTF_BINARY_PATTERN_INT16
#define PRINTF_BYTE_TO_BINARY_INT32(i) \
PRINTF_BYTE_TO_BINARY_INT16((i) >> 16), PRINTF_BYTE_TO_BINARY_INT16(i)
#define PRINTF_BINARY_PATTERN_INT64 \
PRINTF_BINARY_PATTERN_INT32 PRINTF_BINARY_PATTERN_INT32
#define PRINTF_BYTE_TO_BINARY_INT64(i) \
PRINTF_BYTE_TO_BINARY_INT32((i) >> 32), PRINTF_BYTE_TO_BINARY_INT32(i)
/* --- end macros --- */
#include <stdio.h>
int main() {
long long int flag = 1648646756487983144ll;
printf("My Flag "
PRINTF_BINARY_PATTERN_INT64 "\n",
PRINTF_BYTE_TO_BINARY_INT64(flag));
return 0;
}
This outputs:
My Flag 0001011011100001001010110111110101111000100100001111000000101000
For readability you may want to add a separator for eg:
My Flag 00010110,11100001,00101011,01111101,01111000,10010000,11110000,00101000
As of February 3rd, 2022, the GNU C Library been updated to version 2.35. As a result, %b is now supported to output in binary format.
printf-family functions now support the %b format for output of
integers in binary, as specified in draft ISO C2X, and the %B variant
of that format recommended by draft ISO C2X.
Here's a version of the function that does not suffer from reentrancy issues or limits on the size/type of the argument:
#define FMT_BUF_SIZE (CHAR_BIT*sizeof(uintmax_t)+1)
char *binary_fmt(uintmax_t x, char buf[static FMT_BUF_SIZE])
{
char *s = buf + FMT_BUF_SIZE;
*--s = 0;
if (!x) *--s = '0';
for (; x; x /= 2) *--s = '0' + x%2;
return s;
}
Note that this code would work just as well for any base between 2 and 10 if you just replace the 2's by the desired base. Usage is:
char tmp[FMT_BUF_SIZE];
printf("%s\n", binary_fmt(x, tmp));
Where x is any integral expression.
Quick and easy solution:
void printbits(my_integer_type x)
{
for(int i=sizeof(x)<<3; i; i--)
putchar('0'+((x>>(i-1))&1));
}
Works for any size type and for signed and unsigned ints. The '&1' is needed to handle signed ints as the shift may do sign extension.
There are so many ways of doing this. Here's a super simple one for printing 32 bits or n bits from a signed or unsigned 32 bit type (not putting a negative if signed, just printing the actual bits) and no carriage return. Note that i is decremented before the bit shift:
#define printbits_n(x,n) for (int i=n;i;i--,putchar('0'|(x>>i)&1))
#define printbits_32(x) printbits_n(x,32)
What about returning a string with the bits to store or print later? You either can allocate the memory and return it and the user has to free it, or else you return a static string but it will get clobbered if it's called again, or by another thread. Both methods shown:
char *int_to_bitstring_alloc(int x, int count)
{
count = count<1 ? sizeof(x)*8 : count;
char *pstr = malloc(count+1);
for(int i = 0; i<count; i++)
pstr[i] = '0' | ((x>>(count-1-i))&1);
pstr[count]=0;
return pstr;
}
#define BITSIZEOF(x) (sizeof(x)*8)
char *int_to_bitstring_static(int x, int count)
{
static char bitbuf[BITSIZEOF(x)+1];
count = (count<1 || count>BITSIZEOF(x)) ? BITSIZEOF(x) : count;
for(int i = 0; i<count; i++)
bitbuf[i] = '0' | ((x>>(count-1-i))&1);
bitbuf[count]=0;
return bitbuf;
}
Call with:
// memory allocated string returned which needs to be freed
char *pstr = int_to_bitstring_alloc(0x97e50ae6, 17);
printf("bits = 0b%s\n", pstr);
free(pstr);
// no free needed but you need to copy the string to save it somewhere else
char *pstr2 = int_to_bitstring_static(0x97e50ae6, 17);
printf("bits = 0b%s\n", pstr2);
Is there a printf converter to print in binary format?
The printf() family is only able to print integers in base 8, 10, and 16 using the standard specifiers directly. I suggest creating a function that converts the number to a string per code's particular needs.
[Edit 2022] This is expected to change with the next version of C which implements "%b".
Binary constants such as 0b10101010, and %b conversion specifier for printf() function family C2x
To print in any base [2-36]
All other answers so far have at least one of these limitations.
Use static memory for the return buffer. This limits the number of times the function may be used as an argument to printf().
Allocate memory requiring the calling code to free pointers.
Require the calling code to explicitly provide a suitable buffer.
Call printf() directly. This obliges a new function for to fprintf(), sprintf(), vsprintf(), etc.
Use a reduced integer range.
The following has none of the above limitation. It does require C99 or later and use of "%s". It uses a compound literal to provide the buffer space. It has no trouble with multiple calls in a printf().
#include <assert.h>
#include <limits.h>
#define TO_BASE_N (sizeof(unsigned)*CHAR_BIT + 1)
// v--compound literal--v
#define TO_BASE(x, b) my_to_base((char [TO_BASE_N]){""}, (x), (b))
// Tailor the details of the conversion function as needed
// This one does not display unneeded leading zeros
// Use return value, not `buf`
char *my_to_base(char buf[TO_BASE_N], unsigned i, int base) {
assert(base >= 2 && base <= 36);
char *s = &buf[TO_BASE_N - 1];
*s = '\0';
do {
s--;
*s = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[i % base];
i /= base;
} while (i);
// Could employ memmove here to move the used buffer to the beginning
// size_t len = &buf[TO_BASE_N] - s;
// memmove(buf, s, len);
return s;
}
#include <stdio.h>
int main(void) {
int ip1 = 0x01020304;
int ip2 = 0x05060708;
printf("%s %s\n", TO_BASE(ip1, 16), TO_BASE(ip2, 16));
printf("%s %s\n", TO_BASE(ip1, 2), TO_BASE(ip2, 2));
puts(TO_BASE(ip1, 8));
puts(TO_BASE(ip1, 36));
return 0;
}
Output
1020304 5060708
1000000100000001100000100 101000001100000011100001000
100401404
A2F44
const char* byte_to_binary(int x)
{
static char b[sizeof(int)*8+1] = {0};
int y;
long long z;
for (z = 1LL<<sizeof(int)*8-1, y = 0; z > 0; z >>= 1, y++) {
b[y] = (((x & z) == z) ? '1' : '0');
}
b[y] = 0;
return b;
}
None of the previously posted answers are exactly what I was looking for, so I wrote one. It is super simple to use %B with the printf!
/*
* File: main.c
* Author: Techplex.Engineer
*
* Created on February 14, 2012, 9:16 PM
*/
#include <stdio.h>
#include <stdlib.h>
#include <printf.h>
#include <math.h>
#include <string.h>
static int printf_arginfo_M(const struct printf_info *info, size_t n, int *argtypes)
{
/* "%M" always takes one argument, a pointer to uint8_t[6]. */
if (n > 0) {
argtypes[0] = PA_POINTER;
}
return 1;
}
static int printf_output_M(FILE *stream, const struct printf_info *info, const void *const *args)
{
int value = 0;
int len;
value = *(int **) (args[0]);
// Beginning of my code ------------------------------------------------------------
char buffer [50] = ""; // Is this bad?
char buffer2 [50] = ""; // Is this bad?
int bits = info->width;
if (bits <= 0)
bits = 8; // Default to 8 bits
int mask = pow(2, bits - 1);
while (mask > 0) {
sprintf(buffer, "%s", ((value & mask) > 0 ? "1" : "0"));
strcat(buffer2, buffer);
mask >>= 1;
}
strcat(buffer2, "\n");
// End of my code --------------------------------------------------------------
len = fprintf(stream, "%s", buffer2);
return len;
}
int main(int argc, char** argv)
{
register_printf_specifier('B', printf_output_M, printf_arginfo_M);
printf("%4B\n", 65);
return EXIT_SUCCESS;
}
This code should handle your needs up to 64 bits.
I created two functions: pBin and pBinFill. Both do the same thing, but pBinFill fills in the leading spaces with the fill character provided by its last argument.
The test function generates some test data, then prints it out using the pBinFill function.
#define kDisplayWidth 64
char* pBin(long int x,char *so)
{
char s[kDisplayWidth+1];
int i = kDisplayWidth;
s[i--] = 0x00; // terminate string
do { // fill in array from right to left
s[i--] = (x & 1) ? '1' : '0'; // determine bit
x >>= 1; // shift right 1 bit
} while (x > 0);
i++; // point to last valid character
sprintf(so, "%s", s+i); // stick it in the temp string string
return so;
}
char* pBinFill(long int x, char *so, char fillChar)
{
// fill in array from right to left
char s[kDisplayWidth+1];
int i = kDisplayWidth;
s[i--] = 0x00; // terminate string
do { // fill in array from right to left
s[i--] = (x & 1) ? '1' : '0';
x >>= 1; // shift right 1 bit
} while (x > 0);
while (i >= 0) s[i--] = fillChar; // fill with fillChar
sprintf(so, "%s", s);
return so;
}
void test()
{
char so[kDisplayWidth+1]; // working buffer for pBin
long int val = 1;
do {
printf("%ld =\t\t%#lx =\t\t0b%s\n", val, val, pBinFill(val, so, '0'));
val *= 11; // generate test data
} while (val < 100000000);
}
Output:
00000001 = 0x000001 = 0b00000000000000000000000000000001
00000011 = 0x00000b = 0b00000000000000000000000000001011
00000121 = 0x000079 = 0b00000000000000000000000001111001
00001331 = 0x000533 = 0b00000000000000000000010100110011
00014641 = 0x003931 = 0b00000000000000000011100100110001
00161051 = 0x02751b = 0b00000000000000100111010100011011
01771561 = 0x1b0829 = 0b00000000000110110000100000101001
19487171 = 0x12959c3 = 0b00000001001010010101100111000011
Some runtimes support "%b" although that is not a standard.
Also see here for an interesting discussion:
http://bytes.com/forum/thread591027.html
HTH
Maybe a bit OT, but if you need this only for debuging to understand or retrace some binary operations you are doing, you might take a look on wcalc (a simple console calculator). With the -b options you get binary output.
e.g.
$ wcalc -b "(256 | 3) & 0xff"
= 0b11
There is no formatting function in the C standard library to output binary like that. All the format operations the printf family supports are towards human readable text.
The following recursive function might be useful:
void bin(int n)
{
/* Step 1 */
if (n > 1)
bin(n/2);
/* Step 2 */
printf("%d", n % 2);
}
I optimized the top solution for size and C++-ness, and got to this solution:
inline std::string format_binary(unsigned int x)
{
static char b[33];
b[32] = '\0';
for (int z = 0; z < 32; z++) {
b[31-z] = ((x>>z) & 0x1) ? '1' : '0';
}
return b;
}
Use:
char buffer [33];
itoa(value, buffer, 2);
printf("\nbinary: %s\n", buffer);
For more ref., see How to print binary number via printf.
void
print_binary(unsigned int n)
{
unsigned int mask = 0;
/* this grotesque hack creates a bit pattern 1000... */
/* regardless of the size of an unsigned int */
mask = ~mask ^ (~mask >> 1);
for(; mask != 0; mask >>= 1) {
putchar((n & mask) ? '1' : '0');
}
}
Print bits from any type using less code and resources
This approach has as attributes:
Works with variables and literals.
Doesn't iterate all bits when not necessary.
Call printf only when complete a byte (not unnecessarily for all bits).
Works for any type.
Works with little and big endianness (uses GCC #defines for checking).
May work with hardware that char isn't a byte (eight bits). (Tks #supercat)
Uses typeof() that isn't C standard but is largely defined.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <limits.h>
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define for_endian(size) for (int i = 0; i < size; ++i)
#elif __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define for_endian(size) for (int i = size - 1; i >= 0; --i)
#else
#error "Endianness not detected"
#endif
#define printb(value) \
({ \
typeof(value) _v = value; \
__printb((typeof(_v) *) &_v, sizeof(_v)); \
})
#define MSB_MASK 1 << (CHAR_BIT - 1)
void __printb(void *value, size_t size)
{
unsigned char uc;
unsigned char bits[CHAR_BIT + 1];
bits[CHAR_BIT] = '\0';
for_endian(size) {
uc = ((unsigned char *) value)[i];
memset(bits, '0', CHAR_BIT);
for (int j = 0; uc && j < CHAR_BIT; ++j) {
if (uc & MSB_MASK)
bits[j] = '1';
uc <<= 1;
}
printf("%s ", bits);
}
printf("\n");
}
int main(void)
{
uint8_t c1 = 0xff, c2 = 0x44;
uint8_t c3 = c1 + c2;
printb(c1);
printb((char) 0xff);
printb((short) 0xff);
printb(0xff);
printb(c2);
printb(0x44);
printb(0x4411ff01);
printb((uint16_t) c3);
printb('A');
printf("\n");
return 0;
}
Output
$ ./printb
11111111
11111111
00000000 11111111
00000000 00000000 00000000 11111111
01000100
00000000 00000000 00000000 01000100
01000100 00010001 11111111 00000001
00000000 01000011
00000000 00000000 00000000 01000001
I have used another approach (bitprint.h) to fill a table with all bytes (as bit strings) and print them based on the input/index byte. It's worth taking a look.
Maybe someone will find this solution useful:
void print_binary(int number, int num_digits) {
int digit;
for(digit = num_digits - 1; digit >= 0; digit--) {
printf("%c", number & (1 << digit) ? '1' : '0');
}
}
void print_ulong_bin(const unsigned long * const var, int bits) {
int i;
#if defined(__LP64__) || defined(_LP64)
if( (bits > 64) || (bits <= 0) )
#else
if( (bits > 32) || (bits <= 0) )
#endif
return;
for(i = 0; i < bits; i++) {
printf("%lu", (*var >> (bits - 1 - i)) & 0x01);
}
}
should work - untested.
I liked the code by paniq, the static buffer is a good idea. However it fails if you want multiple binary formats in a single printf() because it always returns the same pointer and overwrites the array.
Here's a C style drop-in that rotates pointer on a split buffer.
char *
format_binary(unsigned int x)
{
#define MAXLEN 8 // width of output format
#define MAXCNT 4 // count per printf statement
static char fmtbuf[(MAXLEN+1)*MAXCNT];
static int count = 0;
char *b;
count = count % MAXCNT + 1;
b = &fmtbuf[(MAXLEN+1)*count];
b[MAXLEN] = '\0';
for (int z = 0; z < MAXLEN; z++) { b[MAXLEN-1-z] = ((x>>z) & 0x1) ? '1' : '0'; }
return b;
}
Here is a small variation of paniq's solution that uses templates to allow printing of 32 and 64 bit integers:
template<class T>
inline std::string format_binary(T x)
{
char b[sizeof(T)*8+1] = {0};
for (size_t z = 0; z < sizeof(T)*8; z++)
b[sizeof(T)*8-1-z] = ((x>>z) & 0x1) ? '1' : '0';
return std::string(b);
}
And can be used like:
unsigned int value32 = 0x1e127ad;
printf( " 0x%x: %s\n", value32, format_binary(value32).c_str() );
unsigned long long value64 = 0x2e0b04ce0;
printf( "0x%llx: %s\n", value64, format_binary(value64).c_str() );
Here is the result:
0x1e127ad: 00000001111000010010011110101101
0x2e0b04ce0: 0000000000000000000000000000001011100000101100000100110011100000
No standard and portable way.
Some implementations provide itoa(), but it's not going to be in most, and it has a somewhat crummy interface. But the code is behind the link and should let you implement your own formatter pretty easily.
I just want to post my solution. It's used to get zeroes and ones of one byte, but calling this function few times can be used for larger data blocks. I use it for 128 bit or larger structs. You can also modify it to use size_t as input parameter and pointer to data you want to print, so it can be size independent. But it works for me quit well as it is.
void print_binary(unsigned char c)
{
unsigned char i1 = (1 << (sizeof(c)*8-1));
for(; i1; i1 >>= 1)
printf("%d",(c&i1)!=0);
}
void get_binary(unsigned char c, unsigned char bin[])
{
unsigned char i1 = (1 << (sizeof(c)*8-1)), i2=0;
for(; i1; i1>>=1, i2++)
bin[i2] = ((c&i1)!=0);
}
Here's how I did it for an unsigned int
void printb(unsigned int v) {
unsigned int i, s = 1<<((sizeof(v)<<3)-1); // s = only most significant bit at 1
for (i = s; i; i>>=1) printf("%d", v & i || 0 );
}
One statement generic conversion of any integral type into the binary string representation using standard library:
#include <bitset>
MyIntegralType num = 10;
print("%s\n",
std::bitset<sizeof(num) * 8>(num).to_string().insert(0, "0b").c_str()
); // prints "0b1010\n"
Or just: std::cout << std::bitset<sizeof(num) * 8>(num);

Resources