How does strcat affect the strtok? - c

Assume we need to copy user's input into another string by concatenating the tokens of input, e.g., "hello world" -> "helloworld".
#include <stdio.h>
#include <string.h>
int main(void) {
char buffer[50];
printf("\nEnter a string: ");
while (fgets(buffer, sizeof(buffer), stdin) != 0) {
size_t size = strlen(buffer);
if (size > 0 && buffer[size - 1] == '\n') {
char input[1]; // set it too small
buffer[size - 1] = '\0';
char *tok = strtok(buffer, " "); // works fine
do {
strcat(input, tok); // append to "input" that has not enough space
printf("\nfound token: %s", tok);
tok = strtok(NULL, " "); // produces garbage
} while (tok);
break;
}
}
Running the code above:
Enter a string: hello world
found token: hello
found token: w
found token: r
*** stack smashing detected ***: <unknown> terminated
I struggle to understand how is strtok related to strcat failing to append tok. They are not sharing variables except for tok which is (according to the docs) copied by strcat, so whatever strcat is doing shouldn't affect the strtok behavior and the program should crash on the second strcat call at least, right? But we see that strcat is getting called 3 times before stack smashing gets detected. Can you please explain why?

For starters this array
char input[1];
is not initialized and does not contain a string.
So this call of strcat
strcat(input, tok);
invokes undefined behavior also because the array input is not large enough to store the copied string. It can overwrite memory beyond the array.

There are multiple problems in the code:
char input[1]; is too small to do anything. You cannot concatenate the tokens from the line into this minuscule array. You must define it with a sufficient length, namely the same length as buffer for simplicity.
input must be initialized as an empty string for strcat(input, tok); to have defined behavior. As coded, the first call to strcat corrupts other variables causing the observed behavior, but be aware anything else could happen as a result of this undefined behavior.
char *tok = strtok(buffer, " "); works fine but may return a null pointer if buffer contains only whitespace if anything. The do loop will then invoke undefined behavior on strcat(input, tok). Use a for or while loop instead.
there is a missing } in the code, it is unclear whether you mean to break from the while loop after the first iteration or only upon getting the end of the line.
Here is a modified version:
#include <stdio.h>
#include <string.h>
int main(void) {
char buffer[50];
char input[sizeof buffer] = "";
printf("Enter a string: ");
if (fgets(buffer, sizeof(buffer), stdin)) {
char *tok = strtok(buffer, " \n");
while (tok) {
strcat(input, tok);
printf("found token: %s\n", tok);
tok = strtok(NULL, " \n");
}
printf("token string: %s\n", input);
}
return 0;
}

Related

Reading from an input file and storing words into an array [duplicate]

This question already has an answer here:
Unexpected strtok() behaviour
(1 answer)
Closed 4 years ago.
The end goal is to output a text file where repeating words are encoded as single digits instead. The current problem I'm having is reading words and storing them into an array.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define MAX_CODE 120
void main() {
FILE *inputfile = fopen("input.txt","rw");
char buffer[128];
char *token;
char *words[MAX_CODE];
int i = 0;
while(fgets(buffer, 128, inputfile)){
token = strtok(buffer," ");
printf("Token %d was %s",i,token);
while(token != NULL) {
words[i] = malloc(strlen(token)+1);
strcpy(words[i], token);
i++;
token = strtok(buffer," ");
}
}
for(int i = 0; i<3; i++) printf("%d\n%s\n",i,words[i]);
printf("End");
}
What I get is segmentation fault errors, or nothing. What I want is for words to be an array of strings. I'm allocating memory for each string, so where am I going wrong?
Your second call to strtok should pass NULL for the first argument. Otherwise, strtok will parse the first token over and over again.
token = strtok(buffer," ");
printf("Token %d was %s\n",i,token);
while(i < MAX_CODE && token != NULL) {
words[i] = malloc(strlen(token)+1);
strcpy(words[i], token);
i++;
token = strtok(NULL," ");
}
The check against MAX_CODE is for the safety's sake, in case you ever increase the size of your buffer or reduce the value of MAX_CODE. In your current code, the maximum number of space delimited tokens you can hold in a 128 byte buffer is 64.
From cppreference:
If str != NULL, the call is treated as the first call to strtok for this particular string. ...
If str == NULL, the call is treated as a subsequent calls to strtok: the function continues from where it left in previous invocation. ...

Segmentation fault (core dumped) c

Here is a weird problem:
token = strtok(NULL, s);
printf(" %s\n", token); // these two lines can read the token and print
However!
token = strtok(NULL, s);
printf("%s\n", token); // these two lines give me a segmentation fault
Idk whats happened, because I just add a space before %s\n, and I can see the value of token.
my code:
int main() {
FILE *bi;
struct _record buffer;
const char s[2] = ",";
char str[1000];
const char *token;
bi = fopen(DATABASENAME, "wb+");
/*get strings from input, and devides it into seperate struct*/
while(fgets(str, sizeof(str), stdin)!= NULL) {
printf("%s\n", str); // can print string line by line
token = strtok(str, s);
strcpy(buffer.id, token);
printf("%s\n", buffer.id); //can print the value in the struct
while(token != NULL){
token = strtok(NULL, s);
printf("%s\n", token); // problem starts here
/*strcpy(buffer.lname, token);
printf("%s\n", buffer.lname); // cant do anything with token */
}}
fclose(bi);
return 1;}
Here is the example of string I read from stdin and after parsed(I just tried to strtok the first two elements to see if it works):
<15322101,MOZNETT,JOSE,n/a,n/a,2/23/1943,MALE,824-75-8088,42 SMITH AVENUE,n/a,11706,n/a,n/a,BAYSHORE,NY,518-215-5848,n/a,n/a,n/a
<
< 15322101
< MOZNETT
In the first version your compiler transforms printf() into a
puts() and puts does not allow null pointers, because internally
invokes the strlen() to determine the lenght of the string.
In the case of the second version you add a space in front of format
specifier. This makes it impossible for the compiler to call puts
without appending this two string together. So it invokes the actual
printf() function, which can handle NULL pointers. And your code
works.
Your problem reduces to the following question What is the behavior of printing NULL with printf's %s specifier?
.
In short NULL as an argument to a printf("%s") is undefined. So you need to check for NULL as suggested by #kninnug
You need to change you printf as follows:
token = strtok(NULL, s);
if (token != NULL) printf("%s\n", token);
Or else
printf ("%s\n", token == NULL ? "" : token);

Access the next word/string

I have a simple C-based code to read a file. Read the input line by line. Tokenize the line and prints the current token. My problem is, I want to print the next token if some conditions are satisfied. Do you have any idea how to do it. I really need your help for this project. Thank you
Here is the code:
main(){
FILE *input;
FILE *output;
//char filename[100];
const char *filename = "sample1.txt";
input=fopen(filename,"r");
output=fopen("test.st","w");
char word[1000];
char *token;
int num =0;
char var[100];
fprintf(output,"LEXEME, TOKEN");
while( fgets(word, 1000, input) != NULL ){ //reads a line
token = strtok(word, " \t\n" ); // tokenize the line
while(token!=NULL){ // while line is not equal to null
fprintf(output,"\n");
if (strcmp(token,"SIOL")==0)
fprintf(output,"SIOL, SIOL", token);
else if (strcmp(token,"DEFINE")==0)
fprintf(output,"DEFINE, DEFINE", token);
else if (strcmp(token,"INTEGER")==0){
fprintf(output,"INTEGER, INTEGER");
strcpy(var,token+1);
fprintf(output,"\n%s,Ident",var);
}
else{
printf("%s\n", token);
}
token = strtok(NULL, " \t\n" ); //tokenize the word
}}fclose(output);return 0;}
Continuing from my comment. I'm not sure I completely understand what you need, but if you have the string:
"The quick brown fox";
And, you want to tokenize the string, printing the next word, only if a condition concerning the current word is met, then you need to adjust your thinking just a bit. In your example, you want to print the next word "quick", only if the current word is "The".
The adjustment in thinking is how you look at the test. Instead of thinking about printing the next word if the current matches some condition, you need to save the last word, and only print the current if the last word matches some condition -- "The" in your example.
To handle that situation, you can make use of a statically declared character array of at least 47 characters (the longest word in Merriam-Websters Unabridged Dictionary is 46-character). I'll use 48 in the example below. You may be tempted to just save a pointer to the last word, but when using strtok there is no guarantee that the memory address returned by the previous iteration is preserved -- so make a copy of the word.
Putting the pieces together, you could do something like the following. It saves the prior token in last and then compares the current word to the last and prints the current word if last == "The":
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXW 48
int main (void) {
char str[] = "The quick brown fox";
char last[MAXW] = {0};
char *p;
for (p = strtok (str, " "); p; p = strtok (NULL, " "))
{
if (*last && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
strncpy (last, p, MAXW);
}
return 0;
}
Output
$ ./bin/str_chk_last
'quick'
Let me know if you have any questions.
Test Explanation
As written in the comment *last is simply shorthand for last[0]. So the first part of the test, *last is just testing if ((last[0] != 0) && ... Since last was initially declared and initialized:
char last[MAXW] = {0};
All chars in last are 0 for the first pass through the loop. By including the check last[0] != 0, that just causes the printf to be skipped the first time the for loop executes. The longhand for the test would look like:
if ((last[0] != 0) && strcmp (last, "The") == 0)
printf (" '%s'\n", p);
Which in pseudo code just says:
if (NOT first iteration && last == "The")
printf (" '%s'\n", p);
Let me know if that doesn't make sense.
It is easy to achieve with strtok function. Note that if you put null pointer as the first argument, the function continues scanning the same string where a previous successful call to the function ended. So if you need next token just call
char* token = strtok(NULL, delimeters);
See small example below
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "The quick brown fox";
// split str by space
char* token = strtok(str, " ");
// if a token is found
if(token != NULL) {
// print current token
printf("%s\n", token);
// if token is "The"
if(strcmp(token, "The") == 0) {
// print next token
printf("%s\n", strtok(NULL, " "));
}
}
return 0;
}
The output will be
The
quick

strtok and strncat error

I want to add string "ay" to each word by using both strtok and strncat. But there seemed to be a conflict somewhere that I cannot find. It only gives me the first word "Computeray" for an output. Help?
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "Computer science is hard";
char* Token;
char* work = "ay";
Token = strtok(str, " ");
while (Token != NULL)
{
strncat(Token, work, 2);
printf("%s", Token);
Token = strtok(NULL, " ");
}
return 0;
}
You're modifying the string (with strcat) and expecting strtok to still behave properly - that's not going to work. Instead of using strcat, just print the "ay" separately:
while (Token != NULL)
{
printf("%say ", Token);
Token = strtok(NULL, " ");
}
Even if it were working the way you'd like, you'd be overwriting a bunch of your input along the way. Probably not what you were going for - if you need to build up a whole new string, you should do it into a new buffer, instead of overwriting the input.

Strtok and Strcat conflict

I am trying to work with strtok and strcat but the second printf never shows up. Here is the code:
int i = 0;
char *token[128];
token[i] = strtok(tmp, "/");
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(token[0], token[i]);
printf("%s", token[i]);
i++;
}
If my input is 1/2/3/4/5/6 for tmp then the console output would be 13456. The 2 is always missing. Does anyone know how to fix this?
The two is always missing because on the first iteration of your loop you overwrite it with the call to strcat.
After entry to the loop your buffer contains: "1\02\03/4/5/6" internal strtok pointer is pointing to "3". tokens[1] points to "2".
You then call strcat: "12\0\03/4/5/6" so your token[i] pointer is pointing to "\0". The first print prints nothing.
Subsequent calls are OK because the null characters do not overwrite the input data.
To fix it you should build up your output string into a second buffer, not the one you are parsing.
A working(?) version:
#include <stdio.h>
#include <string.h>
int main(void)
{
int i = 0;
char *token[128];
char tmp[128];
char removed[128] = {0};
strcpy(tmp, "1/2/3/4/5/6");
token[i] = strtok(tmp, "/");
strcat(removed, token[i]);
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(removed, token[i]);
printf("%s", token[i]);
i++;
}
return (0);
}
strtok modifies the input string in place and returns pointers to that string. You then take one of those pointers (token[0]) and pass it to another operation (strcat) that writes to that pointer. The writes are clobbering each other.
If you want to concatenate all the tokens, you should allocate a separate char* to strcpy to.

Resources