Why does strtok() append a space to the last token - c

I am working on creating a simple version of Minix. I am using fgets() to grab user input. I am then using strtok() to split the string up with the delimiter " ". The problem is when I call strtok(NULL, " "), my token stored appends a space to the last char. So if I pass "minimount imagefile.img", my program will grab minimount and store it in variable cmd, then it will grab "imagefile.img " and place it in variable flag. Notice the space at the end of the flag variable is added after the token method.
Is there a way for me to grab just the string without a space at the end after token is called. Or is there a way to manipulate the string to remove the appended space?
printf("Minix: ");
fgets(cmd, BUFFLIM, stdin);
//parses string using delimiter " "
char *token = strtok(cmd, " ");
//assigns flag to what is after delimiter
char *f = strtok(NULL, " ");
//printf("cmd:%s\nf:%s\n", cmd, f);
printf("cmd:%s\nf:%s", cmd, f);
Output:
cmd:"minimount"
f:"imagefile.img "

The standard function fgets can append the new line character '\n' to the entered string provided that there is enough space in the corresponding character array.
So use
char *f = strtok(NULL, " \n");
instead of
char *f = strtok(NULL, " ");
From the C Standard (7.21.7.2 The fgets function)
2 The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array.
Another approach is initially to remove the character from the entered string as for example
#include <string.h>
#include <stdio.h>
//...
fgets(cmd, BUFFLIM, stdin);
cmd[ strcspn( cmd, "\n" ) ] = '\0';
As for your code snippet then it seems you have the following result as it is shown in the demonstrative program.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[100];
fgets( s, sizeof( s ), stdin );
char *cmd = strtok( s, " " );
char *f = strtok( NULL, " " );
printf( "cmd:\"%s\"\nf:\"%s\"", cmd, f );
return 0;
}
The program output is
cmd:"minimount"
f:"imagefile.img
"

Related

The difference between using strtok() to inputed string or declared string

To understand the behavior of strtok() in C ANSI, I worte two code.
#include <stdio.h>
#include <string.h>
int main()
{
char str[101] = "This is";
char *pch;
printf("Splitting string %s into tokens : \n",str);
pch = strtok(str," ");`enter code here`
while(pch != NULL)
{
printf("%s\n",pch);
pch = strtok(NULL, " ");
}
return 0;
}
The result of This program is
Splitting string "This is " into tokens:
This
is
Next, I changed it a little bit.
#include <stdio.h>
#include <string.h>
int main()
{
char str[101] = ;
char *pch;
scanf("%s",str); //After launch program, I typed "This is "
str[strcspn(str,"\n")] = '\0'
printf("Splitting string %s into tokens : \n",str);
pch = strtok(str," ");`enter code here`
while(pch != NULL)
{
printf("%s\n",pch);
pch = strtok(NULL, " ");
}
return 0;
}
It prints
Splitting string "This" into tokens:
This
I can't understand why the second word is gone when I use stdin.
The problem isn't with strtok, but with your use of scanf and the "%s" format specifier. That format specifier reads space delimited strings, i.e you can not use "%s" to read anything with a space in it.
The natural solution is to use fgets instead, which you have already prepared for by "removing the newline" (which scanf would not usually read anyway).
It should have been pretty obvious that the strtok can't be involved, since you print the input string before even calling strtok.

How to get the second string of the input

#include <stdio.h>
#include <stdlib.h>
int main(void) {
char command[256];
char *token;
const char s[2] = " ";
fprintf(stdout, "$ Please enter a command \n");
fflush( stdout );
fgets ( command, 256, stdin );
token = strtok(command, s);
if (strcmp(token, "loaddungeon") == 0){
fprintf(stdout, "$ loaded successfully \n");
fflush( stdout );
}
}
I am trying to use strtok to get the second string of the input. For instance, if the input is "loaddungeon dfile.txt", what I want to get is the "dfile.txt". My function is able to get the string "loaddungeon". But I have no idea how to get the second string "dfile.txt". Can anyone tell me how to do it?
(Consider the input is always "loaddungeon dfile.txt".)
To read the second string, you need to pass NULL to strtok(). Keep in mind that fgets() retains the newline character from the input line, so you should change your delimiter definition from char s[2] = " "; to char s[] = " \r\n";, or char s* = " \r\n". This way the second token will not include any newline characters. Also note that strtok() returns a NULL pointer if no token is found, so the below code tests for this before printing the read tokens.
But, since you say that there are only two strings, I would consider just using sscanf() for this. Using the %s conversion specifier, sscanf() will read characters into a string until a whitespace character is encountered, but will not include this whitespace character in the string. When you use the %s specifier in a scanf() type function, you should specify a maximum field width to avoid buffer overflow. This maximum width should be one less than the size of the buffer to leave room for the '\0' string terminator, 255 in this case. The sscanf() function returns the number of successful assignments made, which should be 2 in this case. The sscanf() approach shown below (commented out) checks this return value before printing the strings.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFFER_MAX 256
int main(void) {
char command[BUFFER_MAX];
char *token1 = NULL;
char *token2 = NULL;
const char *s = " \r\n";
fprintf(stdout, "$ Please enter a command \n");
fflush( stdout );
fgets ( command, BUFFER_MAX, stdin );
token1 = strtok(command, s);
token2 = strtok(NULL, s);
if (token1 && token2 && strcmp(token1, "loaddungeon") == 0) {
fprintf(stdout, "$ loaded successfully: %s\n", token2);
fflush( stdout );
}
/* or instead do this */
/*
char word1[BUFFER_MAX], word2[BUFFER_MAX];
if (sscanf(command, "%255s %255s", word1, word2) == 2) {
if (strcmp(word1, "loaddungeon") == 0){
fprintf(stdout, "$ loaded successfully: %s\n", word2);
fflush( stdout );
}
}
*/
return 0;
}
Every call to strtok will return a pointer to the last token found in the given string (or null if there is none left). To retrieve the second token with space as a delimiter, you need to call strtok for the second time.
int main()
{
char command[256];
char *token1 = NULL;
char *token2 = NULL;
const char s[2] = " ";
fprintf(stdout, "$ Please enter a command \n");
fflush(stdout);
fgets(command, 256, stdin);
token1 = strtok(command, s); // now points to first word
if (NULL != token1) {
token2 = strtok(NULL, s); // now points to second word
}
if (NULL != token2) {
if (strcmp(token2, "loaddungeon") == 0){
fprintf(stdout, "$ loaded successfully \n");
fflush(stdout);
}
}
}

Access the next word/string

I have a simple C-based code to read a file. Read the input line by line. Tokenize the line and prints the current token. My problem is, I want to print the next token if some conditions are satisfied. Do you have any idea how to do it. I really need your help for this project. Thank you
Here is the code:
main(){
FILE *input;
FILE *output;
//char filename[100];
const char *filename = "sample1.txt";
input=fopen(filename,"r");
output=fopen("test.st","w");
char word[1000];
char *token;
int num =0;
char var[100];
fprintf(output,"LEXEME, TOKEN");
while( fgets(word, 1000, input) != NULL ){ //reads a line
token = strtok(word, " \t\n" ); // tokenize the line
while(token!=NULL){ // while line is not equal to null
fprintf(output,"\n");
if (strcmp(token,"SIOL")==0)
fprintf(output,"SIOL, SIOL", token);
else if (strcmp(token,"DEFINE")==0)
fprintf(output,"DEFINE, DEFINE", token);
else if (strcmp(token,"INTEGER")==0){
fprintf(output,"INTEGER, INTEGER");
strcpy(var,token+1);
fprintf(output,"\n%s,Ident",var);
}
else{
printf("%s\n", token);
}
token = strtok(NULL, " \t\n" ); //tokenize the word
}}fclose(output);return 0;}
Continuing from my comment. I'm not sure I completely understand what you need, but if you have the string:
"The quick brown fox";
And, you want to tokenize the string, printing the next word, only if a condition concerning the current word is met, then you need to adjust your thinking just a bit. In your example, you want to print the next word "quick", only if the current word is "The".
The adjustment in thinking is how you look at the test. Instead of thinking about printing the next word if the current matches some condition, you need to save the last word, and only print the current if the last word matches some condition -- "The" in your example.
To handle that situation, you can make use of a statically declared character array of at least 47 characters (the longest word in Merriam-Websters Unabridged Dictionary is 46-character). I'll use 48 in the example below. You may be tempted to just save a pointer to the last word, but when using strtok there is no guarantee that the memory address returned by the previous iteration is preserved -- so make a copy of the word.
Putting the pieces together, you could do something like the following. It saves the prior token in last and then compares the current word to the last and prints the current word if last == "The":
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXW 48
int main (void) {
char str[] = "The quick brown fox";
char last[MAXW] = {0};
char *p;
for (p = strtok (str, " "); p; p = strtok (NULL, " "))
{
if (*last && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
strncpy (last, p, MAXW);
}
return 0;
}
Output
$ ./bin/str_chk_last
'quick'
Let me know if you have any questions.
Test Explanation
As written in the comment *last is simply shorthand for last[0]. So the first part of the test, *last is just testing if ((last[0] != 0) && ... Since last was initially declared and initialized:
char last[MAXW] = {0};
All chars in last are 0 for the first pass through the loop. By including the check last[0] != 0, that just causes the printf to be skipped the first time the for loop executes. The longhand for the test would look like:
if ((last[0] != 0) && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
Which in pseudo code just says:
if (NOT first iteration && last == "The")
printf (" '%s'\n", p);
Let me know if that doesn't make sense.
It is easy to achieve with strtok function. Note that if you put null pointer as the first argument, the function continues scanning the same string where a previous successful call to the function ended. So if you need next token just call
char* token = strtok(NULL, delimeters);
See small example below
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "The quick brown fox";
// split str by space
char* token = strtok(str, " ");
// if a token is found
if(token != NULL) {
// print current token
printf("%s\n", token);
// if token is "The"
if(strcmp(token, "The") == 0) {
// print next token
printf("%s\n", strtok(NULL, " "));
}
}
return 0;
}
The output will be
The
quick

How to cut a string using 2 delimiters

How to cut a string using 2 delimiters in C?
I'm getting a string from the user in this platform:
cp <path1> <path2>
I need to get the pathes into a new string (each path to one string).
I tried to use strstr and strtok but it doesn't work.
I don't know the length of the pathes. I also just know that they are starting with " \" (this are the delimiters that I have (space + \)).
this is what i tried
#include
#include
#include
int main()
{
char *c;
char *ch = malloc(1024);
while (strcmp(ch, "exit"))
{
scanf("%[^\n]%*c", ch); //what was the input (cp /dor/arthur /king/apple)
c = malloc(sizeof(strlen(ch) + 1));
strcpy(c, ch);
char *pch = strtok(c, " //");
printf("this is : %s \n", pch); //printed "this is: cp"
}
}
use strtok() . the above link contains an example of using strtok().
you cans use the 2 delimeters (space + \) with strtok() in this way:
str = strtok(str, " \\");
Is in the main function? If it is, main function has argc (int) and *argv[] (string) parameters which you can do what you want.

How to scan multiple words using sscanf in C?

I'm trying to scan a line that contains multiple words in C. Is there a way to scan it word by word and store each word as a different variable?
For example, I have the following types of lines:
A is the 1 letter;
B is the 2 letter;
C is the 3 letter;
If I'm parsing through the first line: "A is the 1 letter" and I have the following code, what do I put in each case so I can get the individual tokens and store them as variables. To clarify, by the end of this code, I want "is," "the," "1," "letter" in different variables.
I have the following code:
while (feof(theFile) != 1) {
string = "A is the 1 letter"
first_word = sscanf(string);
switch(first_word):
case "A":
what to put here?
case "B":
what to put here?
...
You shouldn't use feof() like that. You should use fgets() or equivalent. You probably need to use the little-known (but present in standard C89) conversion specifier %n.
#include <stdio.h>
int main(void)
{
char buffer[1024];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
char *str = buffer;
char word[256];
int posn;
while (sscanf(str, "%255s%n", word, &posn) == 1)
{
printf("Word: <<%s>>\n", word);
str += posn;
}
}
return(0);
}
This reads a line, then uses sscanf() iteratively to fetch words from the line. The %n format specifier doesn't count towards the successful conversions, hence the comparison with 1. Note the use of %255s to prevent overflows in word. Note too that sscanf() could write a null after the 255 count specified in the conversion specification, hence the difference of one between the declaration of char word[256]; and the conversion specifier %255s.
Clearly, it is up to you to decide what to do with each word as it is extracted; the code here simply prints it.
One advantage of this technique over any solution based on strtok() is that sscanf() does not modify the input string so if you need to report an error, you have the original input line to use in the error report.
After editing the question, it seems that the punctuation like semi-colon is not wanted in a word; the code above would include punctuation as part of the word. In that case, you have to think a bit harder about what to do. The starting point might well be using and alphanumeric scan-set as the conversion specification in place of %255s:
"%255[a-zA-Z_0-9]%n"
You probably then have to look at what's in the character at the start of the next component and skip it if it is not alphanumeric:
if (!isalnum((unsigned char)*str))
{
if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
str += posn;
}
Leading to:
#include <stdio.h>
#include <ctype.h>
int main(void)
{
char buffer[1024];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
char *str = buffer;
char word[256];
int posn;
while (sscanf(str, "%255[a-zA-Z_0-9]%n", word, &posn) == 1)
{
printf("Word: <<%s>>\n", word);
str += posn;
if (!isalnum((unsigned char)*str))
{
if (sscanf(str, "%*[^a-zA-Z_0-9]%n", &posn) == 0)
str += posn;
}
}
}
return(0);
}
You'll need to consider the I18N and L10N aspects of the alphanumeric ranges chosen; what's available may depend on your implementation (POSIX doesn't specify support in scanf() scan-sets for the notations such as [[:alnum:]], unfortunately).
You can use strtok() to tokenize or split strings. Please refer the following link for an example: http://www.cplusplus.com/reference/cstring/strtok/
You can take array of character pointers and assign tokens to them.
Example:
char *tokens[100];
int i = 0;
char *token = strtok(string, " ");
while (token != NULL) {
tokens[i] = token;
token = strtok(NULL, " ");
i++;
}
printf("Total Tokens: %d", i);
Note the %s specifier strips whitespace. So you can write:
std::string s = "A is the 1 letter";
typedef char Word[128];
Word words[6];
int wordsRead = sscanf(s.c_str(), "%128s%128s%128s%128s%128s%128s", words[0], words[1], words[2], words[3], words[4], words[5] );
std::cout << wordsRead << " words read" << std::endl;
for(int i = 0;
i != wordsRead;
++i)
std::cout << "'" << words[i] << "'" << std::endl;
Note how this approach (unlike strtok), effectively requires an assumption about the maximim number of words to read, as well as their lengths.
I would recommend using strtok().
Here is the example from http://www.cplusplus.com/reference/cstring/strtok/
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output will be:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string

Resources