Parsing a hexadecimal string representation as an integer - c

How can I parse a string like "0000000f" as an unsigned long long int? And for larger values, how can I parse a string like "0000000f,0000000f" representing respectively the upper and lower 32 bits?
P.S. Can't use library functions in this issue.

You can use strtoull() from <stdlib.h> this way:
#include <stdlib.h>
unsigned long long parse_u64(const char *s) {
unsigned long long v1;
v1 = strtoull(s, (char **)&s, 16);
if (*s == ',') {
v1 = (v1 << 32) | strtoull(s + 1, NULL, 16);
}
return v1;
}
Note that formatting errors are not detected.
If you cannot rely on library functions, use this:
int getdigit(int c) {
if (c >= '0' && c <= '9') return c - '0';
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
return -1;
}
unsigned long long parse_u64(const char *s) {
unsigned long long v1;
int digit;
for (v1 = 0; *s; s++) {
if (*s == ',')
continue;
digit = getdigit(*s);
if (digit < 0)
break;
v1 = (v1 << 4) | digit;
}
return v1;
}
You can choose to ignore spaces and other characters or to stop parsing as I did.

Like #chqrlie, but with additional error checking,
Sounds like you want a string to integer conversion. Simply enough:
unsigned chtohex(char ch) {
if (ch >= '0' && ch <= '9') return ch - '0';
if (ch >= 'A' && ch <= 'Z') return ch - 'A' + 10;
if (ch >= 'a' && ch <= 'a') return ch - 'a' + 10;
return (unsigned) -1;
}
// return 0 on success,1 on failure
int my_hexstrtoull(const char *s, unsigned long long *dest) {
unsigned long long sum = 0;
unsigned ch;
while (*s) {
if (*s == ',') continue;
unsigned ch = chtohex(*s++);
if (ch >= 16) {
return 1; // Bad hex char
}
if (sum >= ULLONG_MAX/16) {
return 1; // overflow
}
sum = sum * 16 + ch;
s++;
}
*dest = sum;
return 0;
}

Here's a simple solution using scanf:
#include <stdio.h>
#include <stdlib.h>
unsigned long long int parse(char const * s)
{
unsigned long int a, b;
if (sscanf(s, "%8lx,%8lx", &a, &b) == 2)
return ((unsigned long long int) a << 32) + b;
if (sscanf(s, "%8lx", &a) == 1)
return a;
abort();
}

Related

K&R C Programming Language Exercise 2-3 code returns rubbish

I tried to write a solution from exercise 2-3. After compilation, it returns random numbers on output. I don't really understand where this issue is coming from.
Any help appreciated.
StackOverflow keeps asking for more details. The purpose of the program is listed in the code bellow.
More delails.
Purpose of the code:
Write the function htoi(s), which converts a string of hexa-
decimal digits (including an optional 0x or 0X) into its
equivalent integer value. The allowable digits are 0 through 9,
a through f, and A through F.
/*
* Write the function htoi(s), which converts a string of hexa-
* decimal digits (including an optional 0x or 0X) into its
* equivalent integer value. The allowable digits are 0 through 9,
* a through f, and A through F.
*/
#include <stdio.h>
#include <math.h>
int hti(char s)
{
const char hexlist[] = "aAbBcCdDeEfF";
int answ = 0;
int i;
for (i=0; s != hexlist[i] && hexlist[i] != '\0'; i++)
;
if (hexlist[i] == '\0')
answ = 0;
else
answ = 10 + (i/2);
return answ;
}
unsigned int htoi(const char s[])
{
int answ;
int power = 0;
signed int i = 0;
int viable = 0;
int hexit;
if (s[i] == '0')
{
i++;
if (s[i] == 'x' || s[i] == 'X')
i++;
}
const int stop = i;
for (i; s[i] != '\0'; i++)
;
i--;
while (viable == 0 && i >= stop)
{
if (s[i] >= '0' && s[i] <= '9')
{
answ = answ + ((s[i] - '0') * pow(16, power));
}
else
{
hexit = hti(s[i]);
if (hexit == 0)
viable = 1;
else
{
hexit = hexit * (pow(16, power));
answ += hexit;
}
}
i--;
power++;
}
if (viable == 1)
return 0;
else
return answ;
}
int main()
{
char test[] = "AC";
int i = htoi(test);
printf("%d\n", i);
return 0;
}
answ is not initialized in htoi. Initialize it to zero.

Getting a 128 bits integer from command line

I'm trying to cast an unsigned long long key to do the Tiny Encryption Algorithm algorithm.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv){
unsigned int key[4] = { 0 };
*key = strtoll(argv[1], NULL, 10);
printf("%s key = %llu\n", argv[0], key);
return 0;
}
Here is my input :
./a.out 9223372036854775700
Here is the output :
./a.out key = 140723741574976
So I'm passing a 128 bit key in argv[1]. Shouldn't it be cast properly in memory into the unsigned int array?
So, I'm trying to figure out why this is the output of my program. Does this have something to do with endianness?
long long is only specified to contain at least 64 bits. You might be better off passing your key as hex and parsing it manually into a byte array
Take a step back, and look at what you are trying to implement. The Tiny Encryption Algorithm does not work on an 128-bit integer, but on a 128-bit key; the key is composed of four 32-bit unsigned integers.
What you actually need, is a way to parse a decimal (or hexadecimal, or some other base) 128-bit unsigned integer from a string to four 32-bit unsigned integer elements.
I suggest writing a multiply-add function, which takes the quad-32-bit value, multiplies it by a 32-bit constant, and adds another 32-bit constant:
#include <stdint.h>
uint32_t muladd128(uint32_t quad[4], const uint32_t mul, const uint32_t add)
{
uint64_t temp = 0;
temp = (uint64_t)quad[3] * (uint64_t)mul + add;
quad[3] = temp;
temp = (uint64_t)quad[2] * (uint64_t)mul + (temp >> 32);
quad[2] = temp;
temp = (uint64_t)quad[1] * (uint64_t)mul + (temp >> 32);
quad[1] = temp;
temp = (uint64_t)quad[0] * (uint64_t)mul + (temp >> 32);
quad[0] = temp;
return temp >> 32;
}
The above uses most significant first word order. It returns nonzero if the result overflows; in fact, it returns the 32-bit overflow itself.
With that, it is very easy to parse a string describing a nonnegative 128-bit integer in binary, octal, decimal, or hexadecimal:
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
static void clear128(uint32_t quad[4])
{
quad[0] = quad[1] = quad[2] = quad[3] = 0;
}
/* muladd128() */
static const char *parse128(uint32_t quad[4], const char *from)
{
if (!from) {
errno = EINVAL;
return NULL;
}
while (*from == '\t' || *from == '\n' || *from == '\v' ||
*from == '\f' || *from == '\r' || *from == ' ')
from++;
if (from[0] == '0' && (from[1] == 'x' || from[1] == 'X') &&
((from[2] >= '0' && from[2] <= '9') ||
(from[2] >= 'A' && from[2] <= 'F') ||
(from[2] >= 'a' && from[2] <= 'f'))) {
/* Hexadecimal */
from += 2;
clear128(quad);
while (1)
if (*from >= '0' && *from <= '9') {
if (muladd128(quad, 16, *from - '0')) {
errno = ERANGE;
return NULL;
}
from++;
} else
if (*from >= 'A' && *from <= 'F') {
if (muladd128(quad, 16, *from - 'A' + 10)) {
errno = ERANGE;
return NULL;
}
from++;
} else
if (*from >= 'a' && *from <= 'f') {
if (muladd128(quad, 16, *from - 'a' + 10)) {
errno = ERANGE;
return NULL;
}
from++;
} else
return from;
}
if (from[0] == '0' && (from[1] == 'b' || from[1] == 'B') &&
(from[2] >= '0' && from[2] <= '1')) {
/* Binary */
from += 2;
clear128(quad);
while (1)
if (*from >= '0' && *from <= '1') {
if (muladd128(quad, 2, *from - '0')) {
errno = ERANGE;
return NULL;
}
from++;
} else
return from;
}
if (from[0] == '0' &&
(from[1] >= '0' && from[1] <= '7')) {
/* Octal */
from += 1;
clear128(quad);
while (1)
if (*from >= '0' && *from <= '7') {
if (muladd128(quad, 8, *from - '0')) {
errno = ERANGE;
return NULL;
}
from++;
} else
return from;
}
if (from[0] >= '0' && from[0] <= '9') {
/* Decimal */
clear128(quad);
while (1)
if (*from >= '0' && *from <= '9') {
if (muladd128(quad, 10, *from - '0')) {
errno = ERANGE;
return NULL;
}
from++;
} else
return from;
}
/* Not a recognized number. */
errno = EINVAL;
return NULL;
}
int main(int argc, char *argv[])
{
uint32_t key[4];
int arg;
for (arg = 1; arg < argc; arg++) {
const char *end = parse128(key, argv[arg]);
if (end) {
if (*end != '\0')
printf("%s: 0x%08x%08x%08x%08x (+ \"%s\")\n", argv[arg], key[0], key[1], key[2], key[3], end);
else
printf("%s: 0x%08x%08x%08x%08x\n", argv[arg], key[0], key[1], key[2], key[3]);
fflush(stdout);
} else {
switch (errno) {
case ERANGE:
fprintf(stderr, "%s: Too large.\n", argv[arg]);
break;
case EINVAL:
fprintf(stderr, "%s: Not a nonnegative integer in binary, octal, decimal, or hexadecimal notation.\n", argv[arg]);
break;
default:
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
break;
}
}
}
return EXIT_SUCCESS;
}
It is very straightforward to add support for Base64 and Base85, which are sometimes used; or indeed for any base less than 232.
And, if you think about the above, it was all down to being precise about what you need.
Code is attempting to print the address of the array key[0] rather than its value. This is not an endian-ness issue. Enable all compiler warnings to save time.
*key = strtoll(argv[1], NULL, 10); attempts to save a long long (at least 64-bit) into a unsigned int, which is likely only 32.
The string "9223372036854775700" represents a 63 bit number.
First try to use an unsigned long long which is at least a 64-bit number.
int main(int argc, char** argv){
// unsigned int key[4] = { 0 };
unsigned long long key = strtoull(argv[1], NULL, 10);
printf("%s key = %llu\n", argv[0], key);
return 0;
}
C does not specify support for 128-bit integers. User code could be written to cope with that. #C_Elegans idea of using hexadecimal text is good.
As int could be of various sizes, better to use
#include <stdint.h>
// unsigned int key[4];
uint32_t key[4];
A sample code idea
#include <ctype.h>
#include <errno.h>
#include <inttypes.h>
#include <stdint.h>
#include <stdlib.h>
typedef struct {
uint16_t u[8];
} my_uint128_t;
my_uint128_t strtomy_uint128(const char *s, char **endptr, int base) {
my_uint128_t y = {0};
while (isalnum((unsigned char ) *s)) {
char *endptr;
uint32_t sum = (uint32_t) strtoul((char[2]) {*s, '\0'}, &endptr, base);
if (*endptr) {
break;
}
for (int i = 0; i < 8; i++) {
sum += y.u[i] * (uint32_t) base;
y.u[i] = (uint16_t) sum;
sum >>= 16;
}
if (sum) {
errno = ERANGE;
for (int i = 0; i < 8; i++) {
y.u[i] = UINT16_MAX;
}
}
s++;
}
if (endptr) {
*endptr = (char *) s;
}
return y;
}
void uint128_dump(my_uint128_t x) {
for (int i = 8; i > 0; ) {
i--;
printf("%04" PRIX16 "%c", x.u[i], i ? ' ' : '\n');
}
}
int main(void) {
my_uint128_t a = strtomy_uint128("9223372036854775700", 0, 10);
uint128_dump(a);
}
Output
0000 0000 0000 0000 7FFF FFFF FFFF FF94
Why not do it manually? Get a __int128 type variable, go through each digit of your input and insert it in your variable as such:
int main(int argc, char** argv){
__int128 key = 0;
int i;
for (i=0; i<strlen(argv[1]); i++){
key *= 10; // "shift" current value to make space for adding one more decimal
key += argv[1][i] - '0'; // convert ascii character to number
}
printf("%s key = %llu\n", argv[0], key);
return 0;
}
Note that if argv[1] is too long the key will discard its first digits rather than its last. So, maybe that's something you need to take care of too, according to your preferences

Why is my hex conversion function off by one?

I'm trying to learn c and am confused why my hex to int conversion function returns a value that is off by one.
Note: 0XAAAA == 46390
#include <stdio.h>
#include <math.h>
unsigned int hex_to_int(char hex[4]);
int main()
{
char hex[4] = "AAAA";
unsigned int result = hex_to_int(hex);
printf("%d 0X%X\n", result, result);
return 0;
}
unsigned int hex_to_int(char input[4])
{
unsigned int sum, i, exponent;
for(i = 0, exponent = 3; i < 4; i++, exponent--) {
unsigned int n = (int)input[i] - 55;
n *= pow(16, exponent);
sum += n;
}
return sum;
}
Output:
46391 0XAAAB
Update: I now realize "- 55" is ridiculous, I was going off memory from seeing this:
if (input[i] >= '0' && input[i] <= '9')
val = input[i] - 48;
else if (input[i] >= 'a' && input[i] <= 'f')
val = input[i] - 87;
else if (input[i] >= 'A' && input[i] <= 'F')
val = input[i] - 55;
You have several bugs such as the string not getting null terminated, and the ASCII to decimal conversion being nonsense (value 55?), you don't initialize sum and so on. Just do this instead:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char x[] = "AAAA";
unsigned int sum = strtoul(x, NULL, 16);
printf("%d 0X%X\n", sum, sum);
return 0;
}
EDIT
If you insist on doing this manually:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
unsigned int hexstr_to_uint(const char* str);
int main()
{
char x[] = "AAAA";
unsigned int sum = hexstr_to_uint (x);
printf("%d 0X%X\n", sum, sum);
return 0;
}
unsigned int hexstr_to_uint(const char* str)
{
unsigned int sum = 0;
for(; *str != '\0'; str++)
{
sum *= 16;
if(isdigit(*str))
{
sum += *str - '0';
}
else
{
char digit = toupper(*str);
_Static_assert('Z'-'A'==25, "Trash systems not supported.");
if(digit >= 'A' && digit <= 'F')
{
sum += digit - 'A' + 0xA;
}
}
}
return sum;
}
You're just making up logic, there isn't a single value you can subtract from a hexadecimal digit character to convert it into the corresponding number.
If you want to be portable, all that C requires is that the symbols 0 through 9 are consecutive in their encoding. There's no such guarantee for the letters A through F.
Also involving pow() which is a double-precision floating point function in this low-level integer work, is a bit jarring. The typical way to do this is by multiplication or bitwise shifting.
If you're hell-bent on doing the conversion yourself, I usually do something like this:
unsigned int hex2int(const char *a)
{
unsigned int v = 0;
while(isxdigit((unsigned int) *a))
{
v *= 16;
if(isdigit((unsigned int) *a))
v += *a - '0';
else
{
const char highs[] = "abcdef";
const char * const h = strchr(highs, tolower(*a));
v += 10 + (unsigned int) (h - highs);
}
++a;
}
return v;
}
The above is a bit verbose, you can for instance fold the decimal digits into the string used for the letters too, I just tried to be clear. The above should work for any valid C character set encoding, not just ASCII (and it's less passive-aggressive than #Lundin's code, hih :).

Using unsigned int instead of unsigned short changes behaviour

I am attempting the htoi(char*) function from The C Programming Language by K&R (Excercise 2-3, pg. 43).
The function is meant to convert a hexadecimal string to base 10.
I believe I have it working. This is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
enum {hexbase = 16};
typedef enum{false, true} bool;
unsigned int htoi(char* s);
bool hasHexPrefix(char* s);
int main(int argc, char** argv) {
if(argc <= 1) {
printf("Error: Not enough arguments.\n");
return EXIT_FAILURE;
}else {
for(int i = 1; i < argc; i++) {
unsigned int numericVal = htoi(argv[i]);
printf("%s => %u\n",argv[i],numericVal);
}
}
}
unsigned int htoi(char* s) {
unsigned int output = 0;
unsigned int len = (unsigned int)(strlen(s));
unsigned short int firstIndex = hasHexPrefix(s) ? 2 : 0;
/* start from the end of the str (least significant digit) and move to front */
for(int i = len-1; i >= firstIndex; i--) {
int currentChar = s[i];
unsigned int correspondingNumericVal = 0;
if(currentChar >= '0' && currentChar <= '9') {
correspondingNumericVal = currentChar - '0';
}else if(currentChar >= 'a' && currentChar <= 'f') {
correspondingNumericVal = (currentChar - 'a') + 10;
}else if(currentChar >= 'A' && currentChar <= 'F') {
correspondingNumericVal = (currentChar - 'A') + 10;
}else {
printf("Error. Invalid hex digit: %c.\n",currentChar);
}
/* 16^(digitNumber) */
correspondingNumericVal *= pow(hexbase,(len-1)-i);
output += correspondingNumericVal;
}
return output;
}
bool hasHexPrefix(char* s) {
if(s[0] == '0')
if(s[1] == 'x' || s[1] == 'X')
return true;
return false;
}
My issue is with the following line from the htoi(char*) function:
unsigned short int firstIndex = hasHexPrefix(s) ? 2 : 0;
When I remove short to make firstIndex into an unsigned int rather than an unsigned short int, I get an infinite loop.
So when I start from the back of s in htoi(char* s), i >= firstIndex never evaluates to be false.
Why does this happen? Am I missing something trivial or have I done something terribly wrong to cause this undefined behavior?
When firstIndex is unsigned int, in i >= firstIndex then i is converted to unsigned int because of the usual arithmetic conversions. So if i is negative it becomes a large integer in the comparison expression. When firstIndex is unsigned short int in i >= firstIndex, firstIndex is promoted to int and two signed integers are compared.
You can change:
for(int i = len-1; i >= firstIndex; i--)
to
for(int i = len-1; i >= (int) firstIndex; i--)
to have the same behavior in both cases.

Convert a long hex string in to int array with sscanf

I have an input like
char *input="00112233FFAA";
uint8_t output[6];
What is the easiest way to convert input into output with sscanf? (prefer 1 line with no loop) The solution I have in mind doesn't scale to 20+ hex string.
sscanf(input, "%x%x%x%x%x",output[0], output[1]....output[5]);
Why scanf if this can be easily written by hand:
const size_t numdigits = strlen(input) / 2;
uint8_t * const output = malloc(numdigits);
for (size_t i = 0; i != numdigits; ++i)
{
output[i] = 16 * toInt(input[2*i]) + toInt(intput[2*i+1]);
}
unsigned int toInt(char c)
{
if (c >= '0' && c <= '9') return c - '0';
if (c >= 'A' && c <= 'F') return 10 + c - 'A';
if (c >= 'a' && c <= 'f') return 10 + c - 'a';
return -1;
}
If you do not want to use a loop, then you need to explicitly write out all six (or twenty) array locations (although %x is not the correct conversion character - it expects a pointer to unsigned int as its corresponding argument). If you don't want to write them all out, then you need to use a loop - it can be quite simple, though:
for (i = 0; i < 6; i++)
sscanf(&input[i * 2], "%2hhx", &output[i]);
Here is an alternate implementation.
#include <stdio.h>
#include <stdint.h>
#define _base(x) ((x >= '0' && x <= '9') ? '0' : \
(x >= 'a' && x <= 'f') ? 'a' - 10 : \
(x >= 'A' && x <= 'F') ? 'A' - 10 : \
'\255')
#define HEXOF(x) (x - _base(x))
int main() {
char input[] = "00112233FFAA";
char *p;
uint8_t *output;
if (!(sizeof(input) & 1)) { /* even digits + \0 */
fprintf(stderr,
"Cannot have odd number of characters in input: %d\n",
sizeof(input));
return -1;
}
output = malloc(sizeof(input) >> 1);
for (p = input; p && *p; p+=2 ) {
output[(p - input) >> 1] =
((HEXOF(*p)) << 4) + HEXOF(*(p+1));
}
return 0;
}
#caf already had a good simple idea.
However I had to use %02x and now it's working fine:
for (i = 0; i < 6; i++)
sscanf(&input[i * 2], "%02x", &output[i]);

Resources