How to scan multiple words using sscanf in C? - c

I'm trying to scan a line that contains multiple words in C. Is there a way to scan it word by word and store each word as a different variable?
For example, I have the following types of lines:
A is the 1 letter;
B is the 2 letter;
C is the 3 letter;
If I'm parsing through the first line: "A is the 1 letter" and I have the following code, what do I put in each case so I can get the individual tokens and store them as variables. To clarify, by the end of this code, I want "is," "the," "1," "letter" in different variables.
I have the following code:
while (feof(theFile) != 1) {
string = "A is the 1 letter"
first_word = sscanf(string);
switch(first_word):
case "A":
what to put here?
case "B":
what to put here?
...

You shouldn't use feof() like that. You should use fgets() or equivalent. You probably need to use the little-known (but present in standard C89) conversion specifier %n.
#include <stdio.h>
int main(void)
{
char buffer[1024];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
char *str = buffer;
char word[256];
int posn;
while (sscanf(str, "%255s%n", word, &posn) == 1)
{
printf("Word: <<%s>>\n", word);
str += posn;
}
}
return(0);
}
This reads a line, then uses sscanf() iteratively to fetch words from the line. The %n format specifier doesn't count towards the successful conversions, hence the comparison with 1. Note the use of %255s to prevent overflows in word. Note too that sscanf() could write a null after the 255 count specified in the conversion specification, hence the difference of one between the declaration of char word[256]; and the conversion specifier %255s.
Clearly, it is up to you to decide what to do with each word as it is extracted; the code here simply prints it.
One advantage of this technique over any solution based on strtok() is that sscanf() does not modify the input string so if you need to report an error, you have the original input line to use in the error report.
After editing the question, it seems that the punctuation like semi-colon is not wanted in a word; the code above would include punctuation as part of the word. In that case, you have to think a bit harder about what to do. The starting point might well be using and alphanumeric scan-set as the conversion specification in place of %255s:
"%255[a-zA-Z_0-9]%n"
You probably then have to look at what's in the character at the start of the next component and skip it if it is not alphanumeric:
if (!isalnum((unsigned char)*str))
{
if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
str += posn;
}
Leading to:
#include <stdio.h>
#include <ctype.h>
int main(void)
{
char buffer[1024];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
char *str = buffer;
char word[256];
int posn;
while (sscanf(str, "%255[a-zA-Z_0-9]%n", word, &posn) == 1)
{
printf("Word: <<%s>>\n", word);
str += posn;
if (!isalnum((unsigned char)*str))
{
if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
str += posn;
}
}
}
return(0);
}
You'll need to consider the I18N and L10N aspects of the alphanumeric ranges chosen; what's available may depend on your implementation (POSIX doesn't specify support in scanf() scan-sets for the notations such as [[:alnum:]], unfortunately).

You can use strtok() to tokenize or split strings. Please refer the following link for an example: http://www.cplusplus.com/reference/cstring/strtok/
You can take array of character pointers and assign tokens to them.
Example:
char *tokens[100];
int i = 0;
char *token = strtok(string, " ");
while (token != NULL) {
tokens[i] = token;
token = strtok(NULL, " ");
i++;
}
printf("Total Tokens: %d", i);

Note the %s specifier strips whitespace. So you can write:
std::string s = "A is the 1 letter";
typedef char Word[128];
Word words[6];
int wordsRead = sscanf(s.c_str(), "%128s%128s%128s%128s%128s%128s", words[0], words[1], words[2], words[3], words[4], words[5] );
std::cout << wordsRead << " words read" << std::endl;
for(int i = 0;
i != wordsRead;
++i)
std::cout << "'" << words[i] << "'" << std::endl;
Note how this approach (unlike strtok), effectively requires an assumption about the maximim number of words to read, as well as their lengths.

I would recommend using strtok().
Here is the example from http://www.cplusplus.com/reference/cstring/strtok/
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output will be:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string

Related

Parsing a text file with different types in C

I'm having some trouble when parsing a text file. Each line of the text has a name followed after three float values. All of them are separated by a blankspace. What I want is to store the name in a string and the numbers in an array. I know I have to read each line using fgets and then strtok but the thing is I don't understand how strtok works. Do I have to call strtok four times? How do I assign each "piece" to my variables ?
Thank you for your time!
The strtok will search for the given tokens in a string. You must call it until the it returns NULL.
char *strtok(char *str, const char *delim)
The first call is done passing the string (char*) as the argument str and the remaining times are done passing NULL, as this will define that it should keep looking for the next token from that point onwards.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char line[] = "name 1.45 2.55 3.65";
char* name;
double values[3] = { 0 };
char* ptr = NULL;
int i = 0;
ptr = strtok(line, " "); // Search for the first whitespace
if (ptr != NULL) // Whitespace found
{
/* 'line' is now a string with all the text until the whitespace,
with terminating null character */
name = calloc(1, strlen(line));
strcpy(name, line);
while ((i < 3) && (ptr != NULL))
{
ptr = strtok(NULL, " "); // Call 'strtok' with NULL to continue until the next whitespace
if (ptr != NULL) // Whitespace found
{
/* 'ptr' has now the text from the previous token to the found whitespace,
with terminating null character */
values[i] = atof(ptr); // Convert to double
}
i++;
}
}
printf("%s %lf %lf %lf\n", name, values[0], values[1], values[2]);
}

The difference between using strtok() to inputed string or declared string

To understand the behavior of strtok() in C ANSI, I worte two code.
#include <stdio.h>
#include <string.h>
int main()
{
char str[101] = "This is";
char *pch;
printf("Splitting string %s into tokens : \n",str);
pch = strtok(str," ");`enter code here`
while(pch != NULL)
{
printf("%s\n",pch);
pch = strtok(NULL, " ");
}
return 0;
}
The result of This program is
Splitting string "This is " into tokens:
This
is
Next, I changed it a little bit.
#include <stdio.h>
#include <string.h>
int main()
{
char str[101] = ;
char *pch;
scanf("%s",str); //After launch program, I typed "This is "
str[strcspn(str,"\n")] = '\0'
printf("Splitting string %s into tokens : \n",str);
pch = strtok(str," ");`enter code here`
while(pch != NULL)
{
printf("%s\n",pch);
pch = strtok(NULL, " ");
}
return 0;
}
It prints
Splitting string "This" into tokens:
This
I can't understand why the second word is gone when I use stdin.
The problem isn't with strtok, but with your use of scanf and the "%s" format specifier. That format specifier reads space delimited strings, i.e you can not use "%s" to read anything with a space in it.
The natural solution is to use fgets instead, which you have already prepared for by "removing the newline" (which scanf would not usually read anyway).
It should have been pretty obvious that the strtok can't be involved, since you print the input string before even calling strtok.

how to divide words with strtok in an array of chars in c

I have a struct named excuses that has chars, I need to store at least 20 excuses. Then, I need to divide each word of each excuse in an array.
¿How i can do that?
#define excuseLength 256
typedef struct{
char sentence[excuseLength];
}excuse;
excuse listExcuses[20];
for (int listExcuses_i = 0; listExcuses_i < 20; listExcuses_i++)
{
char *input;
scanf("%s", input);
strcpy(listExcuses[listExcuses_i].sentence, input);
char* token = strtok(input, " ");
while(token != NULL){
printf("token: %s\n", token);
token = strtok(NULL, " ");
}
}
Here are some things you can add to your solution:
Check fgets() for return value, as it returns NULL on error.
If you decide to still use scanf(), make sure to use scanf("%255s", input) instead for char input[256]. Using the format specifier %255s instead of the simpe %s checks for excessive input. Overall, it just better to read input using fgets().
Remove '\n' character appended by fgets(). This is also good for checking that you don't enter more characters than the limit of 256 in input, and that your sentences don't have a trailing newline after each of them. If you don't remove this newline, then your strtok() delimiter would have to be " \n" instead.
#define constants in your code, and use const char* for string literals, such as the delimiter for strtok().
You can also add some code to check for empty inputs from fgets(). You could simply use a separate counter, and only increment this counter for valid strings found.
It's also strange to have struct with one member, usually structs contain more than one member. You could simply bypass using a struct and use a 2D char array declared as char listexcuses[NUMEXCUSES][EXCUSELENGTH]. This array can hold up to 20 strings, each of which has a maximum length of 256.
Here is some modified code of your approach:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define EXCUSELENGTH 256
#define NUMEXCUSES 20
typedef struct {
char sentence[EXCUSELENGTH];
} excuse;
int main(void) {
excuse listexcuses[NUMEXCUSES];
char input[EXCUSELENGTH] = {'\0'};
char *word = NULL;
const char *delim = " ";
size_t slen, count = 0;
for (size_t i = 0; i < NUMEXCUSES; i++) {
printf("\nEnter excuse number %zu:\n", count+1);
if (fgets(input, EXCUSELENGTH, stdin) == NULL) {
fprintf(stderr, "Error from fgets(), cannot read line\n");
exit(EXIT_FAILURE);
}
slen = strlen(input);
if (slen > 0 && input[slen-1] == '\n') {
input[slen-1] = '\0';
} else {
fprintf(stderr, "Too many characters entered in excuse %zu\n", count+1);
exit(EXIT_FAILURE);
}
if (*input) {
strcpy(listexcuses[count].sentence, input);
count++;
printf("\nTokens found:\n");
word = strtok(input, delim);
while (word != NULL) {
printf("%s\n", word);
word = strtok(NULL, delim);
}
}
}
return 0;
}
As you need to eventually store these tokens somewhere, you will need another form of storing this data. Since you don't know how many tokens you can get, or how long each token is, you may need to use something like char **tokens. This is not an array, but it is a pointer to a pointer. Using this would allow any number of words and any lengths of each word to be stored. You will need dynamic memory allocation for this. The answer in this post will help.
I changed the scanf for fgets and initialize the char input[256] and with that now it works!
#define excuseLength 256
#define numberExcuses 20
typedef struct{
char sentence[excuseLength];
}excuse;
excuse listExcuses[20];
for (int listExcuses_i = 0; listExcuses_i < numberExcuses; listExcuses_i++)
{
char input[256];
scanf("%s", input);
fgets(input, 256, stdin);
strcpy(listExcuses[listExcuses_i].sentence, input);
char* token = strtok(input, " ");
while(token != NULL){
printf("token: %s\n", token);
token = strtok(NULL, " ");
}
}

Using Pointers and strtok()

I'm building a linked list and need your assistance please as I'm new to C.
I need to input a string that looks like this: (word)_#_(year)_#_(DEFINITION(UPPER CASE))
Ex: Enter a string
Input: invest_#_1945_#_TRADE
Basically I'm looking to build a function that scans the DEFINITION and give's me back the word it relates to.
Enter a word to search in the dictionary
Input: TRADE
Output: Found "TREADE" in the word "invest"
So far I managed to come up using the strtok() function but right now I'm not sure what to do about printing the first word then.
Here's what I could come up with:
char split(char words[99],char *p)
{
p=strtok(words, "_#_");
while (p!=NULL)
{
printf("%s\n",p);
p = strtok(NULL, "_#_");
}
return 0;
}
int main()
{
char hello[99];
char *s = NULL;
printf("Enter a string you want to split\n");
scanf("%s", hello);
split(hello,s);
return 0;
}
Any ideas on what should I do?
I reckon that your problem is how to extract the three bits of information from your formatted string.
The function strtok does not work as you think it does: The second argument is not a literal delimiting string, but a string that serves as a set of characters that are delimiters.
In your case, sscanf seems to be the better choice:
#include <stdlib.h>
#include <stdio.h>
int main()
{
const char *line = "invest_#_1945 _#_TRADE ";
char word[40];
int year;
char def[40];
int n;
n = sscanf(line, "%40[^_]_#_%d_#_%40s", word, &year, def);
if (n == 3) {
printf("word: %s\n", word);
printf("year: %d\n", year);
printf("def'n: %s\n", def);
} else {
printf("Unrecognized line.\n");
}
return 0;
}
The function sscanf examines a given string according to a given pattern. Roughly, that pattern consists of format specifiers that begin with a percent sign, of spaces which denote any amount of white-space characters (including none) and of other characters that have to be matched varbatim. The format specifiers yield a result, which has to be stored. Therefore, for each specifier, a result variable must be given after the format string.
In this case, there are several chunks:
%40[^_] reads up to 40 characters that are not the underscore into a char array. This is a special case of reading a string. Strings in sscanf are really words and may not contain white space. The underscore, however, would be part of a string, so in order not to eat up the underscore of the first delimiter, you have to use the notation [^(chars)], which means: Any sequence of chars that do not contain the given chars. (The caret does the negation here, [(chars)] would mean any sequence of the given chars.)
_#_ matches the first delimiter literally, i.e. only if the next chars are underscore hash mark, underscore.
%d reads a decimal number into an integer. Note that the adress of the integer has to be given here with &.
_#_ matches the second delimiter.
%40s reads a string of up to 40 non-whitespace characters into a char array.
The function returns the number of matched results, which should be three if the line is valid. The function sscanf can be cumbersome, but is probably your best bet here for quick and dirty input.
#include <stdio.h>
#include <string.h>
char *strtokByWord_r(char *str, const char *word, char **store){
char *p, *ret;
if(str != NULL){
*store = str;
}
if(*store == NULL) return NULL;
p = strstr(ret=*store, word);
if(p){
*p='\0';
*store = p + strlen(word);
} else {
*store = NULL;
}
return ret;
}
char *strtokByWord(char *str, const char *word){
static char *store = NULL;
return strtokByWord_r(str, word, &store);
}
int main(){
char input[]="invest_#_1945_#_TRADE";
char *array[3];
char *p;
int i, size = sizeof(array)/sizeof(char*);
for(i=0, p=input;i<size;++i){
if(NULL!=(p=strtokByWord(p, "_#_"))){
array[i]=p;//strdup(p);
p=NULL;
} else {
array[i]=NULL;
break;
}
}
for(i = 0;i<size;++i)
printf("array[%d]=\"%s\"\n", i, array[i]);
/* result
array[0]="invest"
array[1]="1945"
array[2]="TRADE"
*/
return 0;
}

Arrays in C not working

Well, I declared a global array of chars like this char * strarr[];
in a method I am tokenising a line and try to put everything into that array like this
*line = strtok(s, " ");
while (line != NULL) {
*line = strtok(NULL, " ");
}
seems like this is not working.. How can I fix it?
Thanks
Any number of things could be going wrong with the code you haven't shown us, such as undefined behaviour by strtoking a string constatnt, or getting your parameters wrong when calling the function.
But the most likely problem from the code we can see is the use of *line instead of line, assuming that line is of type char *.
Use the following code as a baseline:
#include <stdio.h>
#include <string.h>
int main (void) {
char str[] = "My name is paxdiablo";
// Start tokenising words.
char *line = strtok (str, " ");
while (line != NULL) {
// Print current token and get next word.
printf ("[%s]\n", line);
line = strtok(NULL, " ");
}
return 0;
}
This outputs:
[My]
[name]
[is]
[paxdiablo]
and should be easily modifiable into something you can use.
Be aware that, if you're trying to save the character pointers returned from strtok (which would make sense for using *line), they are transitory and will not be what you expect after you're done. That's because modifications are made in-place within the source string. You can do it with something like:
#include <stdio.h>
#include <string.h>
int main (void) {
char *word[4]; // The array of words.
size_t i; // General counter.
size_t nextword = 0; // For preventing array overflow.
char str[] = "My name is paxdiablo";
// Start tokenising.
char *line = strtok (str, " ");
while (line != NULL) {
// If array not full, duplicate string to array and advance index.
if (nextword < sizeof(word) / sizeof(*word))
word[nextword++] = strdup (line);
// Get next word.
line = strtok(NULL, " ");
}
// Print out all stored words.
for (i = 0; i < nextword; i++)
printf ("[%s]\n", word[i]);
return 0;
}
Note the specific size of the word array in that code above. The use of char * strarr[] in your code, along with the message tentative array definition assumed to have one element is almost certainly where the problem lies.
If your implementation doesn't come with a strdup, you can get a reasonably-priced one here :-)

Resources