Grab all integers from irregular strings in C - c

I am looking for a (relatively) simple way to parse a random string and extract all of the integers from it and put them into an Array - this differs from some of the other questions which are similar because my strings have no standard format.
Example:
pt112parah salin10n m5:isstupid::42$%&%^*%7first3
I would need to eventually get an array with these contents:
112 10 5 42 7 3
And I would like a method more efficient then going character by character through a string.
Thanks for your help

A quick solution. I'm assuming that there are no numbers that exceed the range of long, and that there are no minus signs to worry about. If those are problems, then you need to do a lot more work analyzing the results of strtol() and you need to detect '-' followed by a digit.
The code does loop over all characters; I don't think you can avoid that. But it does use strtol() to process each sequence of digits (once the first digit is found), and resumes where strtol() left off (and strtol() is kind enough to tell us exactly where it stopped its conversion).
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int main(void)
{
const char data[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
long results[100];
int nresult = 0;
const char *s = data;
char c;
while ((c = *s++) != '\0')
{
if (isdigit(c))
{
char *end;
results[nresult++] = strtol(s-1, &end, 10);
s = end;
}
}
for (int i = 0; i < nresult; i++)
printf("%d: %ld\n", i, results[i]);
return 0;
}
Output:
0: 112
1: 10
2: 5
3: 42
4: 7
5: 3

More efficient than going through character by character?
Not possible, because you must look at every character to know that it is not an integer.
Now, given that you have to go though the string character by character, I would recommend simply casting each character as an int and checking that:
//string tmp = ""; declared outside of loop.
//pseudocode for inner loop:
int intVal = (int)c;
if(intVal >=48 && intVal <= 57){ //0-9 are 48-57 when char casted to int.
tmp += c;
}
else if(tmp.length > 0){
array[?] = (int)tmp; // ? is where to add the int to the array.
tmp = "";
}
array will contain your solution.

Just because I've been writing Python all day and I want a break. Declaring an array will be tricky. Either you have to run it twice to work out how many numbers you have (and then allocate the array) or just use the numbers one by one as in this example.
NB the ASCII characters for '0' to '9' are 48 to 57 (i.e. consecutive).
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
int main(int argc, char **argv)
{
char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";
int length = strlen(input);
int value = 0;
int i;
bool gotnumber = false;
for (i = 0; i < length; i++)
{
if (input[i] >= '0' && input[i] <= '9')
{
gotnumber = true;
value = value * 10; // shift up a column
value += input[i] - '0'; // casting the char to an int
}
else if (gotnumber) // we hit this the first time we encounter a non-number after we've had numbers
{
printf("Value: %d \n", value);
value = 0;
gotnumber = false;
}
}
return 0;
}
EDIT: the previous verison didn't deal with 0

Another solution is to use the strtok function
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," abcdefghijklmnopqrstuvwxyz:$%&^*");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " abcdefghijklmnopqrstuvwxyz:$%&^*");
}
return 0;
}
Gives:
112
10
5
42
7
3
Perhaps not the best solution for this task, since you need to specify all characters that will be treated as a token. But it is an alternative to the other solutions.

And if you don't mind using C++ instead of C (usually there isn't a good reason why not), then you can reduce your solution to just two lines of code (using AXE parser generator):
vector<int> numbers;
auto number_rule = *(*(axe::r_any() - axe::r_num())
& *axe::r_num() >> axe::e_push_back(numbers));
now test it:
std::string str = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });
and sure enough, you got your numbers back.
And as a bonus, you don't need to change anything when parsing unicode wide strings:
std::wstring str = L"pt112parah salin10n m5:isstupid::42$%&%^*%7first3";
number_rule(str.begin(), str.end());
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });
and sure enough, you got the same numbers back.

#include <stdio.h>
#include <string.h>
#include <math.h>
int main(void)
{
char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3";
char *pos = input;
int integers[strlen(input) / 2]; // The maximum possible number of integers is half the length of the string, due to the smallest number of digits possible per integer being 1 and the smallest number of characters between two different integers also being 1
unsigned int numInts= 0;
while ((pos = strpbrk(pos, "0123456789")) != NULL) // strpbrk() prototype in string.h
{
sscanf(pos, "%u", &(integers[numInts]));
if (integers[numInts] == 0)
pos++;
else
pos += (int) log10(integers[numInts]) + 1; // requires math.h
numInts++;
}
for (int i = 0; i < numInts; i++)
printf("%d ", integers[i]);
return 0;
}
Finding the integers is accomplished via repeated calls to strpbrk() on the offset pointer, with the pointer being offset again by an amount equaling the number of digits in the integer, calculated by finding the base-10 logarithm of the integer and adding 1 (with a special case for when the integer is 0). No need to use abs() on the integer when calculating the logarithm, as you stated the integers will be non-negative. If you wanted to be more space-efficient, you could use unsigned char integers[] rather than int integers[], as you stated the integers will all be <256, but that isn't a necessity.

Related

C program calling a string to int function, I am unable to convert the input

I would like to convert a string to an int and calling the function from main. Where the first character is a letter declaring the base of the number and the rest of the characters in the string are the number. I am able to get the function to work separately, but when using the main function to call it will not print out the correct values.
Example of user input using binary:
b1000
b1010
result should be:
b
b
1000
1010
Here is the code:
#include <stdio.h>
#include <string.h>
#include <math.h>
int str_to_int(inputbase) {
char num1[50];
num1[50] = inputbase;
char numcpy1[sizeof(num1) - 1];
int i, len1;
int result1 = 0;
//printf("String: ");
//gets(num1);
//Access first character for base
printf("%c \n", num1[0]);
//Remove first character for number1 and number 2
if (strlen(num1) > 0) {
strcpy(numcpy1, &(num1[1]));
} else {
strcpy(numcpy1, num1);
}
len1 = strlen(numcpy1);
//Turn remaining string characters into an int
for (i = 0; i < len1; i++) {
result1 = result1 * 10 + ( numcpy1[i] - '0' );
}
printf("%d \n", result1);
return result1;
}
int main() {
char *number1[50], *number2[50];
int one, two;
printf("\nAsk numbers: \n");
gets(number1);
gets(number2);
one = str_to_int(number1);
two = str_to_int(number2);
printf("\nVerifying...\n");
printf("%d\n", one);
printf("%d\n", two);
return 0;
}
I suppose your code cannot be compiled because some errors.
The first one is in the line
int str_to_int(inputbase)
where inputbase are defined without type.
If this changed to
int str_to_int(char * inputbase)
the next point for improvement is in line
num1[50] = inputbase;
assignement like that has set of errors:
num1[50] means access to 51th item, but there is only 50 items indexed from 0 to 49
statement num1[0] = inputbase; (as well as with any other correct index) is wrong because of difference in types: num1[0] is char, but inputbase is pointer
num1 = inputbase; will be also wrong (for copying string = cannot be used in C, so consider making loop or using standard library function strncpy)
And since this is only the beginning of problems, I suggest starting from decimal input using some standard function for conversion char* string to int (e.g. atoi, or sscanf), then after you check the program and find it correct if it is required you can avoid using standard conversion and write your own str_to_int
The prototype for your function str_to_int() should specify the type of intputbase. You are passing a string and there is no reason for str_to_int to modify this string, so the type should be const char *inputbase.
Furthermore, you do not need a local copy for the string, just access the first character to determine the base and parse the remaining digits accordingly:
#include <stdlib.h>
int str_to_int(const char *inputbase) {
const char *p = inputbase;
int base = 10; // default to decimal
if (*p == 'b') { // binary
p++;
base = 2;
} else
if (*p == 'o') { // octal
p++;
base = 8;
} else
if (*p == 'h') { // hexadecimal
p++;
base = 16;
}
return strtol(p, NULL, base);
}

C convert section of char array to double

I want to convert a section of a char array to a double. For example I have:
char in_string[] = "4014.84954";
Say I want to convert the first 40 to a double with value 40.0. My code so far:
#include <stdio.h>
#include <stdlib.h>
int main(int arg) {
char in_string[] = "4014.84954";
int i = 0;
for(i = 0; i <= sizeof(in_string); i++) {
printf("%c\n", in_string[i]);
printf("%f\n", atof(&in_string[i]));
}
}
In each loop atof it converts the char array from the starting pointer I supply all the way to the end of the array. The output is:
4
4014.849540
0
14.849540
1
14.849540
4
4.849540
.
0.849540
8
84954.000000 etc...
How can I convert just a portion of a char array to a double? This must by modular because my real input_string is much more complicated, but I will ensure that the char is a number 0-9.
The following should work assuming:
I will ensure that the char is a number 0-9.
double toDouble(const char* s, int start, int stop) {
unsigned long long int m = 1;
double ret = 0;
for (int i = stop; i >= start; i--) {
ret += (s[i] - '0') * m;
m *= 10;
}
return ret;
}
For example for the string 23487 the function will do this calculations:
ret = 0
ret += 7 * 1
ret += 8 * 10
ret += 4 * 100
ret += 3 * 1000
ret += 2 * 10000
ret = 23487
You can copy the desired amount of the string you want to another char array, null terminate it, and then convert it to a double. EG, if you want 2 digits, copy the 2 digits you want into a char array of length 3, ensuring the 3rd character is the null terminator.
Or if you don't want to make another char array, you can back up the (n+1)th char of the char array, replace it with a null terminator (ie 0x00), call atof, and then replace the null terminator with the backed up value. This will make atof stop parsing where you placed your null terminator.
Just use sscanf. Use the format "ld" and check for return value is one.
What about that, insert NULL at the right position and then revert it back to the original letter? This means you will manipulate the char array but you will revert it back to the original at the end.
You can create a function that will make the work in a temporary string (on the stack) and return the resulting double:
double atofn (char *src, int n) {
char tmp[50]; // big enough to fit any double
strncpy (tmp, src, n);
tmp[n] = 0;
return atof(tmp);
}
How much simpler could it get than sscanf?
#include <assert.h>
#include <stdio.h>
int main(void) {
double foo;
assert(sscanf("4014.84954", "%02lf", &foo) == 1);
printf("Processed the first two bytes of input and got: %lf\n", foo);
assert(sscanf("4014.84954" + 2, "%05lf", &foo) == 1);
printf("Processed the next five bytes of input and got: %lf\n", foo);
assert(sscanf("4014.84954" + 7, "%lf", &foo) == 1);
printf("Processed the rest of the input and got: %lf\n", foo);
return 0;
}

Trying to put some digits into a char array

I'm trying to create a char array made of some letters and numbers (the function was way more complex initially but i kept simplifying it to figure out why it doesn't work properly). So i have a char array in which i put 2 chars, and try to add some numbers to it.
For a reason i can't figure out, the numbers do not get added to the array. It might be really stupid but I'm new to C so here's the simplified code. Any help is much appreciated, thanks!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char some_string[20];
char *make_str() {
some_string[0] = 'a';
some_string[1] = 'x';
int random = 0;
int rand_copy = 0;
random = (rand());
rand_copy = random;
int count = 2;
while ( rand_copy > 0 ) {
rand_copy = rand_copy / 10;
++count;
}
int i=2;
for (i=2; i<count; i++) {
some_string[i] = random%10;
random = random/10;
}
return (some_string);
}
int main(int argc, const char *argv[]) {
printf("the string is: %s\n",make_str());
return 0;
}
You have many problems:
resulting string is not zero-terminated. Add some_string[i] = '\0'; to fix this
character (char) is something like "a letter", but random % 10 produces a number (int) which when converted to character results in control code (ASCII characters 0-9 are control codes). You'd better use some_string[i] = (random % 10) + '0';
you're using fixed length string (20 characters), which may be enough, but it could lead to many problems. If you are a beginner and haven't learn dynamic memory allocation, than that's ok for now. But remember that fixed-length buffers are one of top-10 reasons for buggy C-code. And if you have to use fixed-length buffers (there are legitimate reason for doing this), ALLWAYS check if you are not overrunning the buffer. Use predefined constants for buffer length.
unless the whole point of your excercise is to try converting numbers to strings, use libc function like snprintf for printing anything into a string.
don't use global variable (some_string) and if you do (it's ok for a small example), there is no point in returning this value.
Slightly better version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUF_LENGTH 20
char some_string[BUF_LENGTH];
char *make_str() {
some_string[0] = 'a';
some_string[1] = 'x';
int random = rand();
int rand_copy = random;
int count = 2;
while (rand_copy > 0) {
rand_copy = rand_copy / 10;
++count;
}
int i;
for (i = 2; i < count; i++) {
/* check for buffer overflow. -1 is for terminating zero */
if (i >= BUF_LENGTH - 1) {
printf("error\n");
exit(EXIT_FAILURE);
}
some_string[i] = (random % 10) + '0';
random = random / 10;
}
/* zero-terminate the string */
some_string[i] = '\0';
return some_string;
}
int main(int argc, const char *argv[]) {
printf("the string is: %s\n",make_str());
return 0;
}

C Library function for converting a string of hex digits to ints?

I have a variable length string where each character represents a hex digit. I could iterate through the characters and use a case statement to convert it to hex but I feel like there has to be a standard library function that will handle this. Is there any such thing?
Example of what I want to do. "17bf59c" -> int intarray[7] = { 1, 7, 0xb, 0xf, 5, 9, 0xc}
No, there's no such function, probably because (and now I'm guessing, I'm not a C standard library architect by a long stretch) it's something that's quite easy to put together from existing functions. Here's one way of doing it decently:
int * string_to_int_array(const char *string, size_t length)
{
int *out = malloc(length * sizeof *out);
if(out != NULL)
{
size_t i;
for(i = 0; i < length; i++)
{
const char here = tolower(string[i]);
out[i] = (here <= '9') ? (here - '\0') : (10 + (here - 'a'));
}
}
return out;
}
Note: the above is untested.
Also note things that maybe aren't obvious, but still subtly important (in my opinion):
Use const for pointer arguments that are treated as "read only" by the function.
Don't repeat the type that out is pointing at, use sizeof *out.
Don't cast the return value of malloc() in C.
Check that malloc() succeeded before using the memory.
Don't hard-code ASCII values, use character constants.
The above still assumes an encoding where 'a'..'f' are contigous, and would likely break on e.g. EBCDIC. You get what you pay for, sometimes. :)
using strtol
void to_int_array (int *dst, const char *hexs)
{
char buf[2] = {0};
char c;
while ((c = *hexs++)) {
buf[0] = c;
*dst++ = strtol(buf,NULL,16);
}
}
Here's another version that allows you to pass in the output array. Most of the time, you don't need to malloc, and that's expensive. A stack variable is typically fine, and you know the output is never going to be bigger than your input. You can still pass in an allocated array, if it's too big, or you need to pass it back up.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
/* str of length len is parsed to individual ints into output
* length of output needs to be at least len.
* returns number of parsed elements. Maybe shorter if there
* are invalid characters in str.
*/
int string_to_array(const char *str, int *output)
{
int *out = output;
for (; *str; str++) {
if (isxdigit(*str & 0xff)) {
char ch = tolower(*str & 0xff);
*out++ = (ch >= 'a' && ch <= 'z') ? ch - 'a' + 10 : ch - '0';
}
}
return out - output;
}
int main(void)
{
int values[10];
int len = string_to_array("17bzzf59c", values);
int i = 0;
for (i = 0; i < len; i++)
printf("%x ", values[i]);
printf("\n");
return EXIT_SUCCESS;
}
#include <stdio.h>
int main(){
char data[] = "17bf59c";
const int len = sizeof(data)/sizeof(char)-1;
int i,value[sizeof(data)/sizeof(char)-1];
for(i=0;i<len;++i)
sscanf(data+i, "%1x",value + i);
for(i=0;i<len;++i)
printf("0x%x\n", value[i]);
return 0;
}

K&R Exercise 2-3 "Hex to int converter" Problem

The program I wrote works in demographics consisting of only single Hexadecimal values. (Probably not the most elegant solution, but I'm a new programmer) My question is, how would I go about handling of multiple hexadecimal digits, such as 0xAF, or 0xFF, etc? I'm not exactly sure, and I've seemed confuse myself greatly, in the attempt. I'm not asking for someone to hold my hand, but to give me a tip where I've gone wrong in this code and thoughts on how to fix it.
Thanks :)
/* Exercise 2-3. Write the function htoi(s), which converts a string of
* hexadecimal digits (including an optional 0x or 0X) into it's equivalent
* integer value. The allowable digits are 0...9 - A...F and a...f.
*
*/
#include <stdio.h>
#include <string.h>
#define NL '\n'
#define MAX 24
int htoi(char *hexd);
int
main(void)
{
char str[MAX] = {0};
char hex[] = "0123456789ABCDEFabcdef\0";
int c;
int i;
int x = 0;
while((c = getchar()) != EOF) {
for(i = 0; hex[i] != '\0'; i++) {
if(c == hex[i])
str[x++] = c;
}
if(c == NL) {
printf("%d\n", htoi(str));
x = 0, i = x;
}
}
return 0;
}
int
htoi(char *hexd)
{
int i;
int n = 0;
for(i = 0; isdigit(hexd[i]); i++)
n = (16 * i) + (hexd[i] - '0');
for(i = 0; isupper(hexd[i]); i++) /* Let's just deal with lowercase characters */
hexd[i] = hexd[i] + 'a' - 'A';
for(i = 0; islower(hexd[i]); i++) {
hexd[i] = hexd[i] - 'a';
n = (16 + i) + hexd[i] + 10;
n = hexd[i] + 10;
}
return n;
}
Someone has alredy asked this (hex to int, k&r 2.3).
Take a look, there are many good answers, but you have to fill in the blanks.
Hex to Decimal conversion [K&R exercise]
Edit:
in
char hex[] = "0123456789ABCDEFabcdef\0";
The \0 is not necesary. hex is alredy nul terminated. Is len (0...f) + 1 = 17 bytes long.
I'll pick on one loop, and leave it to you to rethink your implementation. Specifically this:
for(i = 0; isdigit(hexd[i]); i++)
n = (16 * i) + (hexd[i] - '0');
doesn't do what you probably think it does...
It only processes the first span of characters where isdigit() is TRUE.
It stops on the first character where isdigit() is FALSE.
It doesn't run past the end because isdigit('\0') is known to be FALSE. I'm concerned that might be accidentally correct, though.
It does correctly convert a hex number that can be expressed solely with digits 0-9.
Things to think about for the whole program:
Generally, prefer to not modify input strings unless the modification is a valuable side effect. In your example code, you are forcing the string to lower case in-place. Modifying the input string in-place means that a user writing htoi("1234") is invoking undefined behavior. You really don't want to do that.
Only one of the loops over digits is going to process a non-zero number of digits.
What happens if I send 0123456789ABCDEF0123456789ABCDEF to stdin?
What do you expect to get for 80000000? What did you get? Are you surprised?
Personally, I wouldn't use NL for '\n'. C usage pretty much expects to see \n in a lot of contexts where the macro is not convenient, so it is better to just get used to it now...
I think that the MAX size of string should be either 10 or 18 instead of 24. (If you have already checked the int on your machine and followed the reasoning bellow, it would be beneficial to include it as a comment in your code.)
10 : since htoi() returns an int , int is usually 4 bytes (check your system's too), so the hexadecimal number can be atmost 8 digits in length (4bits to 1 hex digit, 8 bits to a byte), and we want to allow for the optional 0x or 0X.
18 : would be better if htoi() returned a long and its 8 bytes (again, check your system's), so the hexadecimal number can be atmost 16 digits in length, and we want to allow for the optional 0x or 0X.
Please note that that sizes of int and long are machine dependent, and please look at exercise 2.1 in the K&R book to find them.
Here is my version of a classic htoi() function to convert multiple hexadecimal values into decimal integers. It's a full working program compile it and run.
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
int htoi(const char*);
int getRawInt(char);
int main(int argc, char **argv) {
char hex[] = " ";
printf("Enter a hexadecimal number (i.e 33A)\n");
scanf("%s", hex);
printf("Hexedecimal %s in decimal is %d\n", hex, htoi(hex)); // result will be 826
return 0;
}
int htoi(const char *hex) {
const int LEN = strlen(hex) -1;
int power = 1;
int dec = 0;
for(int i = LEN; i >= 0; --i) {
dec += getRawInt(hex[i]) * power;
power *= 16;
}
return dec;
}
int getRawInt(char c) {
if(isalpha(c)) {
return toupper(c) - 'A' + 10;
} return c-'0';
}

Resources