Find the length of an integer within a string - c

If I've got a text file like:
8f5
I can easily use strstrto parse the values 8 and 5 out of it.
As such:
//while fgets.. etc (other variables and declarations before it)
char * ptr = strstr(str,"f");
if(ptr != NULL)
{
int a = atol(ptr-1); // value of 8
int b = atol(ptr+1); // value of 5
}
But what if the values where two decimals long? I could add +2 and -2 to each atol call. But I can't predict when the values are less than 10 or greater, for instance
12f6
or 15f15 As the values are random each time (i.e either one decimal or two). Is there a way to check the length of the values between the string, and then use atol()?

Use atol(str) and atol(ptr+1), if I am reading the question correctly. This will get you the two numbers separated by the f, regardless of how long they are.
Set *ptr = '\0' first if you don't wish to rely on the fact that garbage characters stop atol from parsing.

If the text is always similar to the one you posted, then you can get the three parts of the string with the following code, and you can parse another token if there is a white space between them
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char string[] = "12f5 1234x2912";
char *next;
next = string;
while (*next != '\0') /* While not at the end of the string */
{
char separator[100];
size_t counter;
int firstNumber;
int secondNumber;
/* Get the first number */
firstNumber = strtol(next, &next, 10);
counter = 0;
/* Skip all non-numeric characters and store them in `separator' */
while ((*next != '\0') && (isdigit(*next) == 0))
separator[counter++] = *next++;
/* nul terminate `separator' */
separator[counter] = '\0';
/* extract the second number */
secondNumber = strtol(next, &next, 10);
/* show me how you did it */
printf("%d:%s:%d\n", firstNumber, separator, secondNumber);
/* skip any number of white space characters */
while ((*next != '\0') && (isspace(*next) != 0))
next++;
}
}
in the example above you can see that there are to strings being parsed, you can read the strtol() manual page to understand why this algorithm works.
Normally you should not use atoi() or atol() functions because you cant validate the input string, since there is no way to know whether the function succeded or not.

Related

fscanf with whitespaces as separators - what format should I use?

I have a txt file that its lines are as follows
[7 chars string][whitespace][5 chars string][whitespace][integer]
I want to use fscanf() to read all these into memory, and I'm confused about what format should I use.
Here's an example of such line:
hello box 94324
Notice the filling whitespaces in each string, apart from the separating whitespace.
Edit: I know about the recommendation to use fgets() first, I cannot use it here.
Edit: here's my code
typedef struct Product {
char* id; //Product ID number. This is the key of the search tree.
char* productName; //Name of the product.
int currentQuantity; //How many items are there in stock, currently.
} Product;
int main()
{
FILE *initial_inventory_file = NULL;
Product product = { NULL, NULL, 0 };
//open file
initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");
product.id = malloc(sizeof(char) * 10); //- Product ID: 9 digits exactly. (10 for null character)
product.productName = malloc(sizeof(char) * 11); //- Product name: 10 chars exactly.
//go through each line in inital inventory
while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) != EOF)
{
printf("%9c %10c %i\n", product.id, product.productName, product.currentQuantity);
}
//cleanup...
...
}
Here's a file example: (it's actually 10 chars, 9 chars, and int)
022456789 box-large 1234
023356789 cart-small 1234
023456789 box 1234
985477321 dog food 2
987644421 cat food 5555
987654320 snaks 4444
987654321 crate 9999
987654322 pillows 44
Assuming your input file is well-formed, this is the most straightforward version:
char str1[8] = {0};
char str2[6] = {0};
int val;
...
int result = fscanf( input, "%7s %5s %d", str1, str2, &val );
If result is equal to 3, you successfully read all three inputs. If it's less than 3 but not EOF, then you had a matching failure on one or more of your inputs. If it's EOF, you've either hit the end of the file or there was an input error; use feof( input ) to test for EOF at that point.
If you can't guarantee your input file is well-formed (which most of us can't), you're better off reading in the entire line as text and parsing it yourself. You said you can't use fgets, but there's a way to do it with fscanf:
char buffer[128]; // or whatever size you think would be appropriate to read a line at a time
/**
* " %127[^\n]" tells scanf to skip over leading whitespace, then read
* up to 127 characters or until it sees a newline character, whichever
* comes first; the newline character is left in the input stream.
*/
if ( fscanf( input, " %127[^\n]", buffer ) == 1 )
{
// process buffer
}
You can then parse the input buffer using sscanf:
int result = sscanf( buffer, "%7s %5s %d", str1, str2, &val );
if ( result == 3 )
{
// process inputs
}
else
{
// handle input error
}
or by some other method.
EDIT
Edge cases to watch out for:
Missing one or more inputs per line
Malformed input (such as non-numeric text in the integer field)
More than one set of inputs per line
Strings that are longer than 7 or 5 characters
Value too large to store in an int
EDIT 2
The reason most of us don't recommend fscanf is because it sometimes makes error detection and recovery difficult. For example, suppose you have the input records
foo bar 123r4
blurga blah 5678
and you read it with fscanf( input, "%7s %5s %d", str1, str2, &val );. fscanf will read 123 and assign it to val, leaving r4 in the input stream. On the next call, r4 will get assigned to str1, blurga will get assigned to str2, and you'll get a matching failure on blah. Ideally you'd like to reject the whole first record, but by the time you know there's a problem it's too late.
If you read it as a string first, you can parse and check each field, and if any of them are bad, you can reject the whole thing.
Let's assume the input is
<LWS>* <first> <LWS>+ <second> <LWS>+ <integer>
where <LWS> is any whitespace character, including newlines; <first> has one to seven non-whitespace characters; <second> has one to five non-wihitespace characters; <integer> is an optionally signed integer (in hexadecimal if it begins with 0x or 0X, in octal if it begins with 0, or in decimal otherwise); * indicates zero or more of the preceding element; and + indicates one or more of the preceding element.
Let's say you have a structure,
struct record {
char first[8]; /* 7 characters + end-of-string '\0' */
char second[6]; /* 5 characters + end-of-string '\0' */
int number;
};
then you can read the next record from stream in into the structure pointed to by the caller using e.g.
#include <stdlib.h>
#include <stdio.h>
/* Read a record from stream 'in' into *'rec'.
Returns: 0 if success
-1 if invalid parameters
-2 if read error
-3 if non-conforming format
-4 if bug in function
+1 if end of stream (and no data read)
*/
int read_record(FILE *in, struct record *rec)
{
int rc;
/* Invalid parameters? */
if (!in || !rec)
return -1;
/* Try scanning the record. */
rc = fscanf(in, " %7s %5s %d", rec->first, rec->second, &(rec->number));
/* All three fields converted correctly? */
if (rc == 3)
return 0; /* Success! */
/* Only partially converted? */
if (rc > 0)
return -3;
/* Read error? */
if (ferror(in))
return -2;
/* End of input encountered? */
if (feof(in))
return +1;
/* Must be a bug somewhere above. */
return -4;
}
The conversion specifier %7s converts up to seven non-whitespace characters, and %5s up to five; the array (or char pointer) must have room for an additional end-of-string nul byte, '\0', which the scanf() family of functions add automatically.
If you do not specify the length limit, and use %s, the input can overrun the specified buffer. This is a common cause for the common buffer overflow bug.
The return value from the scanf() family of functions is the number of successful conversions (possibly 0), or EOF if an error occurs. Above, we need three conversions to fully scan a record. If we scan just 1 or 2, we have a partial record. Otherwise, we check if a stream error occurred, by checking ferror(). (Note that you want to check ferror() before feof(), because an error condition may also set feof().) If not, we check if the scanning function encountered end-of-stream before anything was converted, using feof().
If none of the above cases were met, then the scanning function returned zero or negative without neither ferror() or feof() returning true. Because the scanning pattern starts with (whitespace and) a conversion specifier, it should never return zero. The only nonpositive return value from the scanf() family of functions is EOF, which should cause feof() to return true. So, if none of the above cases were met, there must be a bug in the code, triggered by some odd corner case in the input.
A program that reads structures from some stream into a dynamically allocated buffer typically implements the following pseudocode:
Set ptr = NULL # Dynamically allocated array
Set num = 0 # Number of entries in array
Set max = 0 # Number of entries allocated for in array
Loop:
If (num >= max):
Calculate new max; num + 1 or larger
Reallocate ptr
If reallocation failed:
Report out of memory
Abort program
End if
End if
rc = read_record(stream, ptr + num)
If rc == 1:
Break out of loop
Else if rc != 0:
Report error (based on rc)
Abort program
End if
End Loop
The issue in your code using the "%9c ..."-format is that %9c does not write the string terminating character. So your string is probably filled with garbage and not terminated at all, which leads to undefined behaviour when printing it out using printf.
If you set the complete content of the strings to 0 before the first scan, it should work as intended. To achieve this, you can use calloc instead of malloc; this will initialise the memory with 0.
Note that the code also has to somehow consumes the newline character, which is solved by an additional fscanf(f,"%*c")-statement (the * indicates that the value is consumed, but not stored to a variable). Will work only if there are no other white spaces between the last digit and the newline character:
int main()
{
FILE *initial_inventory_file = NULL;
Product product = { NULL, NULL, 0 };
//open file
initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");
product.id = calloc(sizeof(char), 10); //- Product ID: 9 digits exactly. (10 for null character)
product.productName = calloc(sizeof(char), 11); //- Product name: 10 chars exactly.
//go through each line in inital inventory
while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) == 3)
{
printf("%9s %10s %i\n", product.id, product.productName, product.currentQuantity);
fscanf(initial_inventory_file,"%*c");
}
//cleanup...
}
Have you tried the format specifiers?
char seven[8] = {0};
char five[6] = {0};
int myInt = 0;
// loop here
fscanf(fp, "%s %s %d", seven, five, &myInt);
// save to structure / do whatever you want
If you're sure that the formatting and strings are the always fixed length, you could also iterate over input character by character (using something like fgetc() and manually process it. The example above could cause segmentation errors if the string in the file exceeds 5 or 7 characters.
EDIT Manual Scanning Loop:
char seven[8] = {0};
char five[6] = {0};
int myInt = 0;
// loop this part
for (int i = 0; i < 7; i++) {
seven[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
for (int i = 0; i < 5; i++) {
five[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
fscanf(fp, "%d", &myInt);

Filtering String to a legit string in C

I am writing a program in c. The incoming string is like this *H1W000500, this is a legit string and I copy the contents of the string after *H1W i.e 000500 to an integer type.
But I want filter this string if the string is not legit. For example *H1W..... or *H1W~##$, If string is not legit, do not copy content and skip. Only Copy contents if the string is legit as written above.
Here what I am doing, but whenever irrelevant string is there, it copies zero value, which is undesirable.
char ReceivedData[50];
unsigned int Head1Weight;
p = strstr(ReceivedData, "*H1W");
if(p)
{
Head1Weight = strtoul(p+4,&ptr,10);
}
You are close, but your use of strstr can be better expressed with strncmp to compare the first 4 chars of receiveddata. (if your target string exists in the middle of receiveddata, then strstr is fine) You also need to provide error checking on your strtoul conversion. Putting those pieces together you could do something like the following (note: this is shown for a single value, in a loop, change return to continue as noted in the comments)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
/* declare constants, avoid magic number use in code */
enum { PRE = 4, BASE = 10, MAX = 50 };
int main (void) {
char receiveddata[MAX] = "*H1W000500", *p = NULL;
unsigned long head1weight;
if (strncmp (receiveddata, "*H1W", PRE) != 0) /* cmp 4 chars */
return 1; /* you would continue here */
if (strlen (receiveddata) <= PRE) /* more chars exist? */
return 1; /* you would continue here */
errno = 0; /* set errno to known value */
head1weight = (unsigned)strtoul (&receiveddata[PRE], &p, BASE);
/* check for error conversions on conversion */
if ((errno == ERANGE && (head1weight == ULONG_MAX)) ||
(errno != 0 && head1weight == 0)) {
perror ("strtoul");
return 1; /* you would continue here */
}
if (&receiveddata[PRE] == p) { /* check if chars converted */
fprintf (stderr, "No digits were found\n");
return 1; /* you would continue here */
}
printf ("head1weight : %lu\n", head1weight);
return 0;
}
Example Use/Output
$ ./bin/parsetounsigned
head1weight : 500
Look it over and let me know if you have further questions.
(note: C generally avoids the use of MixedCase and camelCase variable names in favor of all lower-case, reserving all upper-case for use with constants and macros. It is style, so it is completely up to you...)
From the Linux man page on strtoul.
If there were no digits at all, strtoul() stores the original value of nptr in *endptr (and returns 0).
So if, after the strtoul, ptr is the same as the starting pointer, you know there were no legitimate characters.
char* ptr;
unsigned long Head1Weight = strtoul(p + 4, &ptr, 10);
if (ptr == p + 4)
{
// There were no digits
}
else if (strlen(ptr) > 0)
{
// There were characters in the string after the end of the number
}
OP's p = strstr(ReceivedData, "*H1W"); if(p) { is insufficient as it passes strstr("abcd12*H1W", "*H1W"), although it is a start.
OP has validation goals that are not specific enough. "the contents of the string after *H1W i.e 000500 to an integer type."
"+123", "-123", " 123" evaluate to integers, are they valid?
"123 " can evaluate to an integers, is that valid?
The sample implies the numeric part should be exactly 6 decimal digits, yet that is uncertain.
Sample code uses unsigned, could that be 16-bit? Only "000000" to "065535" valid?
"-123" converts successful via strtoul(), valid for this goal?
should this pass "*H1W000500xyz"? Is extra text allowed or to be ignored?
This is common in writing code as the specifications initially have interpretation issues and then tend to evolve.
Code should allow for evolution.
Let us start with *H1W followed by exactly 6 decimal digits with sscanf(). Code below uses "%n" to record the scanning position after checking for digits. This approach needs additional work should PREFIX contain %.
// PREFIX should not contain %
#define PREFIX "*H1W"
#define DIGIT_FMT "%*[0-9]"
#define VALID_LENGTH 10
char ReceivedData[50];
unsigned long Head1Weight = 0;
int n = 0;
sscanf(ReceivedData, PREFIX DIGIT_FMT "%n", &n);
if (n == VALID_LENGTH && ReceivedData[VALID_LENGTH] == '\0') {
Head1Weight = strtoul(ReceivedData + sizeof PREFIX - 1, NULL, 10);
}

How to check if an index contains a symbol?

I want to check to make sure that a given string contained in an array called secretWord has no symbols in it (e.g. $ % & #). If it does have a symbol in it, I make the user re-enter the string. It takes advantage of recursion to keep asking until they enter a string that does not contain a symbol.
The only symbol I do accept is the NULL symbol (the symbol represented by the ASCII value of zero). This is because I fill all the empty space in the array with NULL symbols.
My function is as follows:
void checkForSymbols(char *array, int arraysize){ //Checks for symbols in the array and if there are any it recursively calls this function until it gets input without them.
for (int i = 0; i < arraysize; i++){
if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != (char) 0){
flushArray(array, arraysize);
printf("No symbols are allowed in the word. Please try again: ");
fgets(secretWord, sizeof(secretWord) - 1, stdin);
checkForSymbols(secretWord, sizeof(secretWord));
}//end if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != 0)
else
continue;
}//end for(i = 0; i < sizeof(string[]); i++){
}//end checkForSymbols
The problem: When I enter any input (see example below), the if statement runs (it prints No symbols are allowed in the word. Please try again: and asks for new input).
I assume the problem obviously stems from the statement if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != (char) 0). But I have tried changing the (char) 0 part to '\0' and 0 as well and neither change had any effect.
How do I compare if what is in the index is a symbol, then? Why are strings without symbols setting this if statement off?
And if any of you are wondering what the "flushArray" method I used was, here it is:
void flushArray(char *array, int arraysize){ //Fills in the entire passed array with NULL characters
for (int i = 0; i < arraysize; i++){
array[i] = 0;
}
}//end flushArray
This function is called on the third line of my main() method, right after a print statement on the first line that asks users to input a word, and an fgets() statement on the second line that gets the input that this checkForSymbols function is used on.
As per request, an example would be if I input "Hello" as the secretWord string. The program then runs the function on it, and the if statement is for some reason triggered, causing it to
Replace all values stored in the secretWord array with the ASCII value of 0. (AKA NULL)
Prints No symbols are allowed in the word. Please try again: to the console.
Waits for new input that it will store in the secretWord array.
Calls the checkForSymbols() method on these new values stored in secretWord.
And no matter what you input as new secretWord, the checkForSymbols() method's if statement fires and it repeats steps 1 - 4 all over again.
Thank you for being patient and understanding with your help!
You can do something like this to find symbols in your code, put the code at proper location
#include <stdio.h>
#include <string.h>
int main () {
char invalids[] = "#.<#>";
char * temp;
temp=strchr(invalids,'s');//is s an invalid character?
if (temp!=NULL) {
printf ("Invalid character");
} else {
printf("Valid character");
}
return 0;
}
This will check if s is valid entry or not similarly for you can create an array and do something like this if array is not null terminated.
#include <string.h>
char false[] = { '#', '#', '&', '$', '<' }; // note last element isn't '\0'
if (memchr(false, 'a', sizeof(false)){
// do stuff
}
memchr is used if your array is not null terminated.
As suggested by #David C. Rankin you can also use strpbrk like
#include <stdio.h>
#include <string.h>
int main () {
const char str1[] = ",*##_$&+.!";
const char str2[] = "##"; //input string
char *ret;
ret = strpbrk(str1, str2);
if(ret) {
printf("First matching character: %c\n", *ret);
} else {
printf("Continue");
}
return(0);
}
The only symbol I do accept is the NULL symbol (the symbol represented by the ASCII value of zero). This is because I fill all the empty space in the array with NULL symbols.
NULL is a pointer; if you want a character value 0, you should use 0 or '\0'. I assume you're using memset or strncpy to ensure the trailing bytes are zero? Nope... What a shame, your MCVE could be so much shorter (and complete). :(
void checkForSymbols(char *array, int arraysize){
/* ... */
if (!isdigit(array[i]) && !isalpha(array[i]) /* ... */
As per section 7.4p1 of the C standard, ...
In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
Not all char values are representable as an unsigned char or equal to EOF, and so it's possible (and highly likely given the nature of this question) that the code above invokes undefined behaviour.
As you haven't completed your question (by providing an MCVE, and describing what errors are occuring) I'm assuming that the question you're trying to ask might be a duplicate of this question, this question, this question, this question and probably a whole lot of others... If so, did you try Googling the error message? That's probably the first thing you should've done. Should that fail in the future, ask a question about the error message!
As per request, an example would be if I input "Hello" as the secretWord string.
I assume secretWord is declared as char secretWord[] = "Hello"; in your example, and not char *secretWord = "Hello";. The two types are distinct, and your book should clarify that. If not, which book are you reading? I can probably recommend a better book, if you'd like.
Any attempt to modify a string literal (i.e. char *array = "Hello"; flushArray(array, ...)) is undefined behaviour, as explained by answers to this question (among many others, I'm sure).
It seems a solution to this problem might be available by using something like this...
In response to your comment, you are probably making it a bit tougher on yourself than it needs to be. You have two issues to deal with (one you are not seeing). The first being to check the input to validate only a-zA-Z0-9 are entered. (you know that). The second being you need to identify and remove the trailing '\n' read and included in your input by fgets. (that one may be tripping you up)
You don't show how the initial array is filled, but given your use of fgets on secretWord[1], I suspect you are also using fgets for array. Which is exactly what you should be using. However, you need to remove the '\n' included at the end of the buffer filled by fgets before you call checkforsymbols. Otherwise you have character 0xa (the '\n') at the end, which, of course, is not a-zA-Z0-9 and will cause your check to fail.
To remove the trailing '\n', all you need to do is check the last character in your buffer. If it is a '\n', then simply overwrite it with the nul-terminating character (either 0 or the equivalent character representation '\0' -- your choice). You simply need the length of the string (which you get with strlen from string.h) and then check if (string[len - 1] == '\n'). For example:
size_t len = strlen (str); /* get length of str */
if (str[len - 1] == '\n') /* check for trailing '\n' */
str[--len] = 0; /* overwrite with nul-byte */
A third issue, important, but not directly related to the comparison, is to always choose a type for your function that will return an indication of Success/Failure as needed. In your case the choice of void gives you nothing to check to determine whether there were any symbols found or not. You can choose any type you like int, char, char *, etc.. All will allow the return of a value to gauge success or failure. For testing strings, the normal choice is char *, returning a valid pointer on success or NULL on failure.
A fourth issue when taking input is you always need to handle the case where the user chooses to cancel input by generating a manual EOF with either ctrl+d on Linux or ctrl+z on windoze. The return of NULL by fgets gives you that ability. But with it (and every other input function), you have to check the return and make use of the return information in order to validate the user input. Simply check whether fgets returns NULL on your request for input, e.g.
if (!fgets (str, MAXS, stdin)) { /* read/validate input */
fprintf (stderr, "EOF received -> user canceled input.\n");
return 1; /* change as needed */
}
For your specific case where you only want a-zA-Z0-9, all you need to do is iterate down the string the user entered, checking each character to make sure it is a-zA-Z0-9 and return failure if anything else is encountered. This is made easy given that every string in C is nul-terminated. So you simply assign a pointer to the start of your string (e.g. char *p = str;) and then use either a for or while loop to check each character, e.g.
for (; *p != 0; p++) { do stuff }
that can be written in shorthand:
for (; *p; p++) { do stuff }
or use while:
while (*p) { do stuff; p++; }
Putting all of those pieces together, you could write your function to take a string as its only parameter and return NULL if a symbol is encountered, or return a pointer to your original string on success, e.g.
char *checkforsymbols (char *s)
{
if (!s || !*s) return NULL; /* validate string and not empty */
char *p = s; /* pointer to iterate over string */
for (; *p; p++) /* for each char in s */
if ((*p < 'a' || *p > 'z') && /* char is not a-z */
(*p < 'A' || *p > 'Z') && /* char is not A-Z */
(*p < '0' || *p > '9')) { /* char is not 0-9 */
fprintf (stderr, "error: '%c' not allowed in input.\n", *p);
return NULL; /* indicate failure */
}
return s; /* indicate success */
}
A short complete test routine could be:
#include <stdio.h>
#include <string.h>
#define MAXS 256
char *checkforsymbols (char *s);
int main (void) {
char str[MAXS] = "";
size_t len = 0;
for (;;) { /* loop until str w/o symbols */
printf (" enter string: "); /* prompt for user input */
if (!fgets (str, MAXS, stdin)) { /* read/validate input */
fprintf (stderr, "EOF received -> user canceled input.\n");
return 1;
}
len = strlen (str); /* get length of str */
if (str[len - 1] == '\n') /* check for trailing '\n' */
str[--len] = 0; /* overwrite with nul-byte */
if (checkforsymbols (str)) /* check for symbols */
break;
}
printf (" valid str: '%s'\n", str);
return 0;
}
char *checkforsymbols (char *s)
{
if (!s || !*s) return NULL; /* validate string and not empty */
char *p = s; /* pointer to iterate over string */
for (; *p; p++) /* for each char in s */
if ((*p < 'a' || *p > 'z') && /* char is not a-z */
(*p < 'A' || *p > 'Z') && /* char is not A-Z */
(*p < '0' || *p > '9')) { /* char is not 0-9 */
fprintf (stderr, "error: '%c' not allowed in input.\n", *p);
return NULL; /* indicate failure */
}
return s; /* indicate success */
}
Example Use/Output
$ ./bin/str_chksym
enter string: mydoghas$20worthoffleas
error: '$' not allowed in input.
enter string: Baddog!
error: '!' not allowed in input.
enter string: Okheisagood10yearolddog
valid str: 'Okheisagood10yearolddog'
or if the user cancels user input:
$ ./bin/str_chksym
enter string: EOF received -> user canceled input.
footnote 1.
C generally prefers the use of all lower-case variable names, while reserving all upper-case for macros and defines. Leave MixedCase or camelCase variable names for C++ and java. However, since this is a matter of style, this is completely up to you.

How do I parse a string in C?

I am a beginner learning C; so, please go easy on me. :)
I am trying to write a very simple program that takes each word of a string into a "Hi (input)!" sentence (it assumes you type in names). Also, I am using arrays because I need to practice them.
My problem is that, some garbage gets putten into the arrays somewhere, and it messes up the program. I tried to figure out the problem but to no avail; so, it is time to ask for expert help. Where have I made mistakes?
p.s.: It also has an infinite loop somewhere, but it is probably the result of the garbage that is put into the array.
#include <stdio.h>
#define MAX 500 //Maximum Array size.
int main(int argc, const char * argv[])
{
int stringArray [MAX];
int wordArray [MAX];
int counter = 0;
int wordCounter = 0;
printf("Please type in a list of names then hit ENTER:\n");
// Fill up the stringArray with user input.
stringArray[counter] = getchar();
while (stringArray[counter] != '\n') {
stringArray[++counter] = getchar();
}
// Main function.
counter = 0;
while (stringArray[wordCounter] != '\n') {
// Puts first word into temporary wordArray.
while ((stringArray[wordCounter] != ' ') && (stringArray[wordCounter] != '\n')) {
wordArray[counter++] = stringArray[wordCounter++];
}
wordArray[counter] = '\0';
//Prints out the content of wordArray.
counter = 0;
printf("Hi ");
while (wordArray[counter] != '\0') {
putchar(wordArray[counter]);
counter++;
}
printf("!\n");
//Clears temporary wordArray for new use.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
wordCounter++;
counter = 0;
}
return 0;
}
Solved it! I needed to add to following if sentence to the end when I incremented the wordCounter. :)
if (stringArray[wordCounter] != '\n') {
wordCounter++;
}
You are using int arrays to represent strings, probably because getchar() returns in int. However, strings are better represented as char arrays, since that's what they are, in C. The fact that getchar() returns an int is certainly confusing, it's because it needs to be able to return the special value EOF, which doesn't fit in a char. Therefore it uses int, which is a "larger" type (able to represent more different values). So, it can fit all the char values, and EOF.
With char arrays, you can use C's string functions directly:
char stringArray[MAX];
if(fgets(stringArray, sizeof stringArray, stdin) != NULL)
printf("You entered %s", stringArray);
Note that fscanf() will leave the end of line character(s) in the string, so you might want to strip them out. I suggest implementing an in-place function that trims off leading and trailing whitespace, it's a good exercise as well.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
You never enter into this loop.
user1799795,
For what it's worth (now that you've solved your problem) I took the liberty of showing you how I'd do this given the restriction "use arrays", and explaining a bit about why I'd do it that way... Just beware that while I am experienced programmer I'm no C guru... I've worked with guys who absolutely blew me into the C-weeds (pun intended).
#include <stdio.h>
#include <string.h>
#define LINE_SIZE 500
#define MAX_WORDS 50
#define WORD_SIZE 20
// Main function.
int main(int argc, const char * argv[])
{
int counter = 0;
// ----------------------------------
// Read a line of input from the user (ie stdin)
// ----------------------------------
char line[LINE_SIZE];
printf("Please type in a list of names then hit ENTER:\n");
while ( fgets(line, LINE_SIZE, stdin) == NULL )
fprintf(stderr, "You must enter something. Pretty please!");
// A note on that LINE_SIZE parameter to the fgets function:
// wherever possible it's a good idea to use the version of the standard
// library function that allows you specificy the maximum length of the
// string (or indeed any array) because that dramatically reduces the
// incedence "string overruns", which are a major source of bugs in c
// programmes.
// Also note that fgets includes the end-of-line character/sequence in
// the returned string, so you have to ensure there's room for it in the
// destination string, and remember to handle it in your string processing.
// -------------------------
// split the line into words
// -------------------------
// the current word
char word[WORD_SIZE];
int wordLength = 0;
// the list of words
char words[MAX_WORDS][WORD_SIZE]; // an array of upto 50 words of
// upto 20 characters each
int wordCount = 0; // the number of words in the array.
// The below loop syntax is a bit cyptic.
// The "char *c=line;" initialises the char-pointer "c" to the start of "line".
// The " *c;" is ultra-shorthand for: "is the-char-at-c not equal to zero".
// All strings in c end with a "null terminator" character, which has the
// integer value of zero, and is commonly expressed as '\0', 0, or NULL
// (a #defined macro). In the C language any integer may be evaluated as a
// boolean (true|false) expression, where 0 is false, and (pretty obviously)
// everything-else is true. So: If the character at the address-c is not
// zero (the null terminator) then go-round the loop again. Capiche?
// The "++c" moves the char-pointer to the next character in the line. I use
// the pre-increment "++c" in preference to the more common post-increment
// "c++" because it's a smidge more efficient.
//
// Note that this syntax is commonly used by "low level programmers" to loop
// through strings. There is an alternative which is less cryptic and is
// therefore preferred by most programmers, even though it's not quite as
// efficient. In this case the loop would be:
// int lineLength = strlen(line);
// for ( int i=0; i<lineLength; ++i)
// and then to get the current character
// char ch = line[i];
// We get the length of the line once, because the strlen function has to
// loop through the characters in the array looking for the null-terminator
// character at its end (guess what it's implementation looks like ;-)...
// which is inherently an "expensive" operation (totally dependant on the
// length of the string) so we atleast avoid repeating this operation.
//
// I know I might sound like I'm banging on about not-very-much but once you
// start dealing with "real word" magnitude datasets then such habits,
// formed early on, pay huge dividends in the ability to write performant
// code the first time round. Premature optimisation is evil, but my code
// doesn't hardly ever NEED optimising, because it was "fairly efficient"
// to start with. Yeah?
for ( char *c=line; *c; ++c ) { // foreach char in line.
char ch = *c; // "ch" is the character value-at the-char-pointer "c".
if ( ch==' ' // if this char is a space,
|| ch=='\n' // or we've reached the EOL char
) {
// 1. add the word to the end of the words list.
// note that we copy only wordLength characters, instead of
// relying on a null-terminator (which doesn't exist), as we
// would do if we called the more usual strcpy function instead.
strncpy(words[wordCount++], word, wordLength);
// 2. and "clear" the word buffer.
wordLength=0;
} else if (wordLength==WORD_SIZE-1) { // this word is too long
// so split this word into two words.
strncpy(words[wordCount++], word, wordLength);
wordLength=0;
word[wordLength++] = ch;
} else {
// otherwise: append this character to the end of the word.
word[wordLength++] = ch;
}
}
// -------------------------
// print out the words
// -------------------------
for ( int w=0; w<wordCount; ++w ) {
printf("Hi %s!\n", words[w]);
}
return 0;
}
In the real world one can't make such restrictive assumptions about the maximum-length of words, or how many there will be, and if such restrictions are given they're almost allways arbitrary and therefore proven wrong all too soon... so straight-off-the-bat for this problem, I'd be inclined to use a linked-list instead of the "words" array... wait till you get to "dynamic data structures"... You'll love em ;-)
Cheers. Keith.
PS: You're going pretty well... My advise is "just keep on truckin"... this gets a LOT easier with practice.

get last integer value from char array

Consider a char array like this:
43 234 32 32
I want the last value that is 32 in integer form.
The string size/length is not known. In the above example there are 4 numbers, but the size will vary.
How can this be done?
i have copied these value from the file onto char array.now i want the last number in integer variable
When you were copying,add a counter of # of characters copied. Then do this
int count = 0;
char c;
while(c = readCharFromFile()) {
array[count++] = c;
}
int last = array[count - 1];
There are many ways to solve this.
Convert every token (space delimited string) into a number and when the tokens run out return the last value converted.
Scan the line for tokens until you get to the end and then convert the last token into a number.
Start at the end of the line. Skip spaces and store digits until the first space is encountered and then convert the result to a number.
Split the string into an array of strings and convert the last one into a number.
I could go on and on but you get the idea I hope.
int getLastInt(char *data)
{
size_t i = strlen(data);
if(!i--) return -1; // failure
for(;i;--i)
{
if(data[i] == ' ')
{
return atoi(&data[i+1]);
}
}
return -1; // failure
}
Should work as long as the data has a space + actual text.
You could also skip the strlen and just loop forward, which could be faster depending on your system's strlen.
If there is no trailing whitespace on the line:
int last_int(const char *s)
{
const char *ptr = strrchr(s, ' ');
if (ptr == NULL) {
ptr = s;
} else {
ptr++;
}
return atoi(ptr);
}
If there can be trailing whitespace, then you'll need to do something like what ProdigySim suggested, but with more states, to walk backwards past the trailing whitespace (if any), then past the number, then call atoi(). No matter what you do, you'll need to watch out for boundary conditions and edge cases and decide how you want to handle them.
I guess you want to use a combination of strtok_r and atoi

Resources