Atoi of a long string error - c

I have this code:
char kbits[k];
long int bits;
The kbits string is always filled with '1', for example:
If k = 5,
kbits = "11111"
If k = 9,
kbits = "111111111"
To do this I use this code:
for(i = 0; i < k; i++)
kbits[i] = '1';
And then I run:
bits = atoi(kbits);
So I have the integer of kbits, for example, if kbits = "1111", bits = 1111;
For k <= 10, it runs perfectly fine.
For k > 10 it puts a 7 in last position of kbits (for example, if k = 11, kbits = 11111111117) and bits = 2147483647 (this value is for any value of k, I think it is random?)

atoi interprets its input as a decimal number, not a binary number. Your longer string is causing the value to overflow, in which case the return value of atoi is undefined (although returning INT_MAX is common, which is equal to the value you're seeing).
To interpret the input as a binary number you can use strtol like this:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <limits.h>
int main()
{
char line[256];
while (printf("\nInput: "),
fgets(line, sizeof line, stdin))
{
// skip initial whitespace and break if input is nothing but whitespace
char *p = line;
while (isspace(*p)) ++p;
if (*p == '\0') break;
// interpret input as a binary number
long n = strtol(line, &p, 2);
if (p == line)
printf("** Can't convert any characters. **\n");
else if (n == LONG_MAX || n == LONG_MIN)
printf("** Range error **\n");
else
printf("Value: %ld\n", n); // prints value in decimal
while (isspace(*p)) ++p; // skip whitespace after the value
if (*p != '\0')
printf("** Excess characters in input **\n");
}
return 0;
}

Your input is too large for the return type. Per the docs:
If the converted value cannot be represented, the behavior is undefined.
An int is a signed type of at least 16-bits. On your platform it may be 32-bits and, in that case, its max value is 2^31-1. Too small for your input of ~11 billion. Use strtol, which returns a long int, instead.
Also, make sure you're terminating that string.

2147483647 (this value is for any value of k, I think it is random?)
No, it is not random; it is the largest value a signed 32-bit integer can hold.
Your inputs are too big.
When this happens, the return value of atoi is undefined (per the documentation); the result you're seeing is common in practice.

Define kbits to be one byte longer then the number of 1s you want to store and do initialise it to all 0s.
#define K (5)
char kbits[K + 1] = ""; /* One more for the `0`temrination neseccary to terminate a C-"string". */

Related

sscanf and scanset stops reading of hex numbers

I try to verify an UUID v4. I try to do this with sscanf, if the UUID can be read completly with sscanf (= total number of characters read - 36), i assume this is a correct UUID. My code up to now:
#include <stdio.h>
int main()
{
char uuid[ 37 ] = "da4dd6a0-5d4c-4dc6-a5e3-559a89aff639";
int a = 0, b = 0, c = 0, d = 0, e = 0, g = 0;
long long int f = 0;
printf( "uuid >%s<, variables read: %d \n", uuid, sscanf( uuid, "%8x-%4x-4%3x-%1x%3x-%12llx%n", &a, &b, &c, &d, &e, &f, &g ) );
printf( " a - %x, b - %x, c - %x, d - %x, e - %x, f - %llx, total number of characters read - %d \n", a, b, c, d, e, f, g );
return 0;
}
which return the following output
uuid >da4dd6a0-5d4c-4dc6-a5e3-559a89aff639<, variables read: 6
a - da4dd6a0, b - 5d4c, c - dc6, d - a, e - 5e3, f - 559a89aff639, total number of characters read - 36
So far, everything okay.
Now I want to include, that the first character after the third hyphen needs to be one of [89ab]. So I changed %1x%3x to %1x[89ab]%3x. But now, the first character is read and the rest not anymore.
The output:
uuid >da4dd6a0-5d4c-4dc6-a5e3-559a89aff639<, variables read: 4
a - da4dd6a0, b - 5d4c, c - dc6, d - a, e - 0, f - 0, total number of characters read - 0
What am I missing? What is wrong with the syntax? Is possible to read it like this? I tried several combinations of the scanset and the specifier, but nothing works.
Instead of using sscanf() for this task, you might just write a simple dedicated function:
#include <ctype.h>
#include <string.h>
int check_UUID(const char *s) {
int i;
for (i = 0; s[i]; i++) {
if (i == 8 || i == 13 || i == 18 || i == 23) {
if (s[i] != '-')
return 0;
} else {
if (!isxdigit((unsigned char)s[i])) {
return 0;
}
}
if (i != 36)
return 0;
// you can add further tests for specific characters:
if (!strchr("89abAB", s[19]))
return 0;
return 1;
}
If you insist on using sscanf(), here is concise implementation:
#include <stdio.h>
int check_UUID(const char *s) {
int n = 0;
sscanf(s, "%*8[0-9a-fA-F]-%*4[0-9a-fA-F]-%*4[0-9a-fA-F]-%*4[0-9a-fA-F]-%*12[0-9a-fA-F]%n", &n);
return n == 36 && s[n] == '\0';
}
If you want to refine the test for the first character after the third hyphen, add another character class:
#include <stdio.h>
int check_UUID(const char *s) {
int n = 0;
sscanf(s, "%*8[0-9a-fA-F]-%*4[0-9a-fA-F]-%*4[0-9a-fA-F]-%*1[89ab]%*3[0-9a-fA-F]-%*12[0-9a-fA-F]%n", &n);
return n == 36 && s[n] == '\0';
}
Notes:
The * after the % means do not store the conversion, just skip the characters and the 1 means consume at most 1 character.
For the number of characters parsed by sscanf to reach 36, all hex digit sequences must have exactly the specified width.
%n causes scanf to store the number of characters read so far into the int pointed to by the next argument.
your conversion specification is useful to get the actual UUID numbers, but the %x format accepts leading white space, an optional sign and an optional 0x or 0X prefix, all of which are invalid inside a UUID. You can first validate the UUID, then convert it to its individual parts if required.
Now I want to include, that the first character after the third hyphen needs to be one of [89ab]. So I changed %1x%3x to %1x[89ab]%3x
Should have been "%1[89ab]%3x" and then saved into a 2 character string. Then convert that small string into a hex value with strtol(..., ..., 16).
Instead, I suggest a 2 step validation for universally unique identifier (UUID)
:
Check for syntax, then read the value.
I'd avoid "%x" as it allows leading spaces, leading '+','-' and optional leading 0x and narrow inputs.
For validation, perhaps a simply test in code:
#include <ctype.h>
#include <stdio.h>
// byte lengths: 4-2-2-2-6
typedef struct {
unsigned long time_low;
unsigned time_mid;
unsigned time_hi_and_version;
unsigned clock_seq_hi_and_res_clock_seq_low;
unsigned long long node;
} uuid_T;
uuid_T* validate_uuid(uuid_T *dest, const char *uuid_source) {
static const char *uuid_pat = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";
const char *pat = uuid_pat;
const unsigned char *u = (const unsigned char*) uuid_source;
while (*u) {
if ((*pat++ == 'x' && !isxdigit(*u)) || *u != '-') {
return NULL;
}
u++;
}
if (*pat) { // Too short
return NULL;
}
sscanf(uuid_source, "%lx-%x-%x-%x-%llx", &dest->time_low,
&dest->time_mid, &dest->time_hi_and_version,
&dest->clock_seq_hi_and_res_clock_seq_low, &dest->node);
return dest;
}
u is unsigned char *u so isxdigit(*u) is only called with non-negative values and so avoids UB,

Parse string to number in C

I tried to write a function to convert a string to an int:
int convert(char *str, int *n){
int i;
if (str == NULL) return 0;
for (i = 0; i < strlen(str); i++)
if ((isdigit(*(str+i))) == 0) return 0;
*n = *str;
return 1;
}
So what's wrong with my code?
*n = *str means:
Set the 4 bytes of memory that n points to, to the 1 byte of memory that str points to. This is perfectly fine but it's probably not your intention.
Why are you trying to convert a char* to an int* in the first place? If you literally just need to do a conversion and make the compiler happy, you can just do int *foo = (int*)bar where bar is the char*.
Sorry, I don't have the reputation to make this a comment.
The function definitely does not perform as intended.
Here are some issues:
you should include <ctype.h> for isdigit() to be properly defined.
isdigit(*(str+i)) has undefined behavior if str contains negative char values. You should cast the argument:
isdigit((unsigned char)str[i])
the function returns 0 if there is any non digit character in the string. What about "-1" and "+2"? atoi and strtol are more lenient with non digit characters, they skip initial white space, process an optional sign and subsequent digits, stopping at the first non digit.
the test for (i = 0; i < strlen(str); i++) is very inefficient: strlen may be invoked for each character in the string, with O(N2) time complexity. Use this instead:
for (i = 0; str[i] != '\0'; i++)
*n = *str does not convert the number represented by the digits in str, it merely stores the value of the first character into n, for example '0' will convert to 48 on ASCII systems. You should instead process every digit in the string, multiplying the value converted so far by 10 and adding the value represented by the digit with str[i] - '0'.
Here is a corrected version with your restrictive semantics:
int convert(const char *str, int *n) {
int value = 0;
if (str == NULL)
return 0;
while (*str) {
if (isdigit((unsigned char)*str)) {
value = value * 10 + *str++ - '0';
} else {
return 0;
}
}
*n = value;
return 1;
}
conversion of char* pointer to int*
#include
main()
{
char c ,*cc;
int i, *ii;
float f,*ff;
c = 'A'; /* ascii value of A gets
stored in c */
i=25;
f=3.14;
cc =&c;
ii=&i;
ff=&f;
printf("\n Address contained
in cc =%u",cc);
printf("\n Address contained
in ii =%u",ii);
printf(:\n Address contained
in ff=%u",ff);
printf(\n value of c= %c",
*cc);
printf(\n value of i=%d",
**ii);
printf(\n value of f=%f",
**ff);
}

Binary to Decimal Conversion in C - Input Size Issue

I have to write a C program for one of my classes that converts a given binary number to decimal. My program works for smaller inputs, but not for larger ones. I believe this may be due to the conversion specifier I am using for scanf() but I am not positive. My code is below
#include<stdio.h>
#include<math.h>
int main(void)
{
unsigned long inputNum = 0;
int currentBinary = 0;
int count = 0;
float decimalNumber = 0;
printf( "Input a binary number: " );
scanf( "%lu", &inputNum );
while (inputNum != 0)
{
currentBinary = inputNum % 10;
inputNum = inputNum / 10;
printf("%d\t%d\n", currentBinary, inputNum);
decimalNumber += currentBinary * pow(2, count);
++count;
}
printf("Decimal conversion: %.0f", decimalNumber);
return 0;
}
Running with a small binary number:
Input a binary number: 1011
1 101
1 10
0 1
1 0
Decimal conversion: 11
Running with a larger binary number:
Input a binary number: 1000100011111000
2 399133551
1 39913355
5 3991335
5 399133
3 39913
3 3991
1 399
9 39
9 3
3 0
Decimal conversion: 5264
"1000100011111000" is a 20 digit number. Certainly unsigned long is too small on your platform.
unsigned long is good - up to at least 10 digits.1
unsigned long long is better - up to at least 20 digits.1
To get past that:
Below is an any size conversion by reading 1 char at a time and forming an unbounded string.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// Double the decimal form of string: "512" --> "1024"
char *sdouble(char *s, size_t *len, int carry) {
size_t i = *len;
while (i > 0) {
i--;
int sum = (s[i] - '0')*2 + carry;
s[i] = sum%10 + '0';
carry = sum/10;
}
if (carry) {
(*len)++;
s = realloc(s, *len + 1); // TBD OOM check
memmove(&s[1], s, *len);
s[0] = carry + '0';
}
return s;
}
int main(void) {
int ch;
size_t len = 1;
char *s = malloc(len + 1); // TBD OOM check
strcpy(s, "0");
while ((ch = fgetc(stdin)) >= '0' && ch <= '1') {
s = sdouble(s, &len, ch - '0');
}
puts(s);
free(s);
return 0;
}
100 digits
1111111111000000000011111111110000000000111111111100000000001111111111000000000011111111110000000000
1266413867935323811836706421760
1 When the lead digit is 0 or 1.
When you do this for a large number inputNum
currentBinary = inputNum % 10;
its top portion gets "sliced off" on conversion to int. If you would like to stay within the bounds of an unsigned long, switch currentBinary to unsigned long as well, and use an unsigned long format specifier in printf. Moreover, unsigned long may not be sufficiently large on many platforms, so you need to use unsigned long long.
Demo.
Better yet, switch to reading the input in a string, validating it to be zeros and ones (you have to do that anyway) and do the conversion in a cleaner character-by-character way. This would let you go beyond the 64-bit of 19 binary digits to have a full-scale int input.
unsigned long supports a maximum number of 4294967295, which means in the process of scanf( "%lu", &inputNum ); you've sliced the decimal number 1000100011111000 to a 32-bit unsigned long number.
I think scanf inputNum to a string would help a lot. In the while loop condition check if the string is empty now, and in the loop body get the last char of the string, detect if it's an '1' of a '0', and then calculate the binary number using this info.
I was tasked with writing a binary to decimal converted with taking larger binary inputs, but using embedded C programming in which we are not allowed to use library functions such as strlen. I found a simpler way to write this conversion tool using C, with both strlen, and also sizeof, as shown in the code below. Hope this helps. As you can see, strlen is commented out but either approach works fine. Sizeof just accounts for the 0 elecment in the array and that is why sizeof (number) -1 is used. Cheers!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
const char number[] = "100111111111111111111111";
int binToDec(char *);
int main()
{
printf("Output: %d", binToDec(&number));
}
int binToDec(char *n)
{
char *num = n;
int decimal_value = 0;
int base = 1;
int i;
int len = sizeof(number)-1;
//int len = strlen(number);
for (i=len-1; i>=0; i--)
{
if (num[i] == '1')
decimal_value += base;
base = base * 2;
}
return decimal_value;
}

Grab all integers from irregular strings in C

I am looking for a (relatively) simple way to parse a random string and extract all of the integers from it and put them into an Array - this differs from some of the other questions which are similar because my strings have no standard format.
Example:
pt112parah salin10n m5:isstupid::42$%&%^*%7first3
I would need to eventually get an array with these contents:
112 10 5 42 7 3
And I would like a method more efficient then going character by character through a string.
Thanks for your help
A quick solution. I'm assuming that there are no numbers that exceed the range of long, and that there are no minus signs to worry about. If those are problems, then you need to do a lot more work analyzing the results of strtol() and you need to detect '-' followed by a digit.
The code does loop over all characters; I don't think you can avoid that. But it does use strtol() to process each sequence of digits (once the first digit is found), and resumes where strtol() left off (and strtol() is kind enough to tell us exactly where it stopped its conversion).
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int main(void)
{
const char data[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
long results[100];
int nresult = 0;
const char *s = data;
char c;
while ((c = *s++) != '\0')
{
if (isdigit(c))
{
char *end;
results[nresult++] = strtol(s-1, &end, 10);
s = end;
}
}
for (int i = 0; i < nresult; i++)
printf("%d: %ld\n", i, results[i]);
return 0;
}
Output:
0: 112
1: 10
2: 5
3: 42
4: 7
5: 3
More efficient than going through character by character?
Not possible, because you must look at every character to know that it is not an integer.
Now, given that you have to go though the string character by character, I would recommend simply casting each character as an int and checking that:
//string tmp = ""; declared outside of loop.
//pseudocode for inner loop:
int intVal = (int)c;
if(intVal >=48 && intVal <= 57){ //0-9 are 48-57 when char casted to int.
tmp += c;
}
else if(tmp.length > 0){
array[?] = (int)tmp; // ? is where to add the int to the array.
tmp = "";
}
array will contain your solution.
Just because I've been writing Python all day and I want a break. Declaring an array will be tricky. Either you have to run it twice to work out how many numbers you have (and then allocate the array) or just use the numbers one by one as in this example.
NB the ASCII characters for '0' to '9' are 48 to 57 (i.e. consecutive).
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
int main(int argc, char **argv)
{
char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";
int length = strlen(input);
int value = 0;
int i;
bool gotnumber = false;
for (i = 0; i < length; i++)
{
if (input[i] >= '0' && input[i] <= '9')
{
gotnumber = true;
value = value * 10; // shift up a column
value += input[i] - '0'; // casting the char to an int
}
else if (gotnumber) // we hit this the first time we encounter a non-number after we've had numbers
{
printf("Value: %d \n", value);
value = 0;
gotnumber = false;
}
}
return 0;
}
EDIT: the previous verison didn't deal with 0
Another solution is to use the strtok function
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," abcdefghijklmnopqrstuvwxyz:$%&^*");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " abcdefghijklmnopqrstuvwxyz:$%&^*");
}
return 0;
}
Gives:
112
10
5
42
7
3
Perhaps not the best solution for this task, since you need to specify all characters that will be treated as a token. But it is an alternative to the other solutions.
And if you don't mind using C++ instead of C (usually there isn't a good reason why not), then you can reduce your solution to just two lines of code (using AXE parser generator):
vector<int> numbers;
auto number_rule = *(*(axe::r_any() - axe::r_num())
& *axe::r_num() >> axe::e_push_back(numbers));
now test it:
std::string str = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });
and sure enough, you got your numbers back.
And as a bonus, you don't need to change anything when parsing unicode wide strings:
std::wstring str = L"pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });
and sure enough, you got the same numbers back.
#include <stdio.h>
#include <string.h>
#include <math.h>
int main(void)
{
char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";
char *pos = input;
int integers[strlen(input) / 2]; // The maximum possible number of integers is half the length of the string, due to the smallest number of digits possible per integer being 1 and the smallest number of characters between two different integers also being 1
unsigned int numInts= 0;
while ((pos = strpbrk(pos, "0123456789")) != NULL) // strpbrk() prototype in string.h
{
sscanf(pos, "%u", &(integers[numInts]));
if (integers[numInts] == 0)
pos++;
else
pos += (int) log10(integers[numInts]) + 1; // requires math.h
numInts++;
}
for (int i = 0; i < numInts; i++)
printf("%d ", integers[i]);
return 0;
}
Finding the integers is accomplished via repeated calls to strpbrk() on the offset pointer, with the pointer being offset again by an amount equaling the number of digits in the integer, calculated by finding the base-10 logarithm of the integer and adding 1 (with a special case for when the integer is 0). No need to use abs() on the integer when calculating the logarithm, as you stated the integers will be non-negative. If you wanted to be more space-efficient, you could use unsigned char integers[] rather than int integers[], as you stated the integers will all be <256, but that isn't a necessity.

K&R Exercise 2-3 "Hex to int converter" Problem

The program I wrote works in demographics consisting of only single Hexadecimal values. (Probably not the most elegant solution, but I'm a new programmer) My question is, how would I go about handling of multiple hexadecimal digits, such as 0xAF, or 0xFF, etc? I'm not exactly sure, and I've seemed confuse myself greatly, in the attempt. I'm not asking for someone to hold my hand, but to give me a tip where I've gone wrong in this code and thoughts on how to fix it.
Thanks :)
/* Exercise 2-3. Write the function htoi(s), which converts a string of
* hexadecimal digits (including an optional 0x or 0X) into it's equivalent
* integer value. The allowable digits are 0...9 - A...F and a...f.
*
*/
#include <stdio.h>
#include <string.h>
#define NL '\n'
#define MAX 24
int htoi(char *hexd);
int
main(void)
{
char str[MAX] = {0};
char hex[] = "0123456789ABCDEFabcdef\0";
int c;
int i;
int x = 0;
while((c = getchar()) != EOF) {
for(i = 0; hex[i] != '\0'; i++) {
if(c == hex[i])
str[x++] = c;
}
if(c == NL) {
printf("%d\n", htoi(str));
x = 0, i = x;
}
}
return 0;
}
int
htoi(char *hexd)
{
int i;
int n = 0;
for(i = 0; isdigit(hexd[i]); i++)
n = (16 * i) + (hexd[i] - '0');
for(i = 0; isupper(hexd[i]); i++) /* Let's just deal with lowercase characters */
hexd[i] = hexd[i] + 'a' - 'A';
for(i = 0; islower(hexd[i]); i++) {
hexd[i] = hexd[i] - 'a';
n = (16 + i) + hexd[i] + 10;
n = hexd[i] + 10;
}
return n;
}
Someone has alredy asked this (hex to int, k&r 2.3).
Take a look, there are many good answers, but you have to fill in the blanks.
Hex to Decimal conversion [K&R exercise]
Edit:
in
char hex[] = "0123456789ABCDEFabcdef\0";
The \0 is not necesary. hex is alredy nul terminated. Is len (0...f) + 1 = 17 bytes long.
I'll pick on one loop, and leave it to you to rethink your implementation. Specifically this:
for(i = 0; isdigit(hexd[i]); i++)
n = (16 * i) + (hexd[i] - '0');
doesn't do what you probably think it does...
It only processes the first span of characters where isdigit() is TRUE.
It stops on the first character where isdigit() is FALSE.
It doesn't run past the end because isdigit('\0') is known to be FALSE. I'm concerned that might be accidentally correct, though.
It does correctly convert a hex number that can be expressed solely with digits 0-9.
Things to think about for the whole program:
Generally, prefer to not modify input strings unless the modification is a valuable side effect. In your example code, you are forcing the string to lower case in-place. Modifying the input string in-place means that a user writing htoi("1234") is invoking undefined behavior. You really don't want to do that.
Only one of the loops over digits is going to process a non-zero number of digits.
What happens if I send 0123456789ABCDEF0123456789ABCDEF to stdin?
What do you expect to get for 80000000? What did you get? Are you surprised?
Personally, I wouldn't use NL for '\n'. C usage pretty much expects to see \n in a lot of contexts where the macro is not convenient, so it is better to just get used to it now...
I think that the MAX size of string should be either 10 or 18 instead of 24. (If you have already checked the int on your machine and followed the reasoning bellow, it would be beneficial to include it as a comment in your code.)
10 : since htoi() returns an int , int is usually 4 bytes (check your system's too), so the hexadecimal number can be atmost 8 digits in length (4bits to 1 hex digit, 8 bits to a byte), and we want to allow for the optional 0x or 0X.
18 : would be better if htoi() returned a long and its 8 bytes (again, check your system's), so the hexadecimal number can be atmost 16 digits in length, and we want to allow for the optional 0x or 0X.
Please note that that sizes of int and long are machine dependent, and please look at exercise 2.1 in the K&R book to find them.
Here is my version of a classic htoi() function to convert multiple hexadecimal values into decimal integers. It's a full working program compile it and run.
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
int htoi(const char*);
int getRawInt(char);
int main(int argc, char **argv) {
char hex[] = " ";
printf("Enter a hexadecimal number (i.e 33A)\n");
scanf("%s", hex);
printf("Hexedecimal %s in decimal is %d\n", hex, htoi(hex)); // result will be 826
return 0;
}
int htoi(const char *hex) {
const int LEN = strlen(hex) -1;
int power = 1;
int dec = 0;
for(int i = LEN; i >= 0; --i) {
dec += getRawInt(hex[i]) * power;
power *= 16;
}
return dec;
}
int getRawInt(char c) {
if(isalpha(c)) {
return toupper(c) - 'A' + 10;
} return c-'0';
}

Resources