Given a file with the following line:
word1 word2 word3 word4
I tried to write the following code:
FILE* f = fopen("my_file.txt", "r");
char line[MAX_BUFFER + 1];
if (fgets(line, MAX_LENGTH_LINE, f) == NULL) {
return NULL;
}
char* word = strtok(line, " ");
for (int i = 0; i < 4; i++) {
printf("%s ", word);
word = strtok(NULL, " ");
}
For prints the "words".
It's working. But, I don't understand something.
How it's acheive the last word word4? (I don't understand it because that after "word4" not exists a space)..
I'm not quite sure what you're asking. Are you asking how the program was able to correctly read word4 from the file even though it wasn't followed by a space? Or are you asking why, when the program printed word4 back out, it didn't seem to print a space after it?
The answer to the first question is that strtok is designed to give you tokens separated by delimiters, not terminated by delimiters. There is no requirement that the last token be followed by a delimiter.
To see the answer to the second question, it may be more clear if we adjust the program and its printout slightly:
char* word = strtok(line, " ");
for (int i = 0; word != NULL; i++) {
printf("%d: \"%s\"\n", i, word);
word = strtok(NULL, " ");
}
I have made two changes here:
The loop runs until word is NULL, that is, as long as strtok finds another word on the line. (This is to make sure we see all the words, and to make sure we're not trying to treat the fourth word specially in any way. If you were trying to treat the fourth word specially in some way, please say so.)
The words are printed back out surrounded by quotes, so that we can see exactly what they contain.
When I run the modified program, I see:
0: "word1"
1: "word2"
2: "word3"
3: "word4
"
That last line looks very strange at first, but the explanation is straightforward. You originally read the line using fgets, which does copy the terminating \n character into the line buffer. So it ends up staying tacked onto word4; that is, the fourth "word" is "word4\n".
For this reason, it's often a good idea to include \n in the set of whitespace delimiter characters you hand to strtok -- that is, you can call strtok(line, " \n") instead. If I do that (in both of the strtok calls), the output changes to
0: "word1"
1: "word2"
2: "word3"
3: "word4"
which may be closer to what you expected.
Your code doesn't check the return value of strtok(), it may be unsafe in some cases.
/* Split string
#content origin string content
#delim delimiter for splitting
#psize pointer pointing at the variable to store token size
#return tokens after splitting
*/
const char **split(char *content, const char *delim, int *psize)
{
char *token;
const char **tokens;
int capacity;
int size = 0;
token = strtok(content, delim);
if (!token)
{
return NULL;
}
// Initialize tokens
tokens = malloc(sizeof(char *) * 64);
if (!tokens)
{
exit(-1);
}
capacity = 64;
tokens[size++] = token;
while ((token = strtok(NULL, delim)))
{
if (size >= capacity)
{
tokens = realloc(tokens, sizeof(char *) * capacity * 2);
if (!tokens)
{
exit(-1);
}
capacity *= 2;
}
tokens[size++] = token;
}
// if (size < capacity)
// {
// tokens = realloc(tokens, sizeof(char *) * size);
// if (!tokens)
// {
// exit(-1);
// }
// }
*psize = size;
return tokens;
}
I have this code in my program:
char* tok = NULL;
char move[100];
if (fgets(move, 100, stdin) != NULL)
{
/* then split into tokens using strtok */
tok = strtok(move, " ");
while (tok != NULL)
{
printf("Element: %s\n", tok);
tok = strtok(NULL, " ");
}
}
I have tried adding printf statements before and after fgets, and the one before gets printed, but the one after does not.
I cannot see why this fgets call is causing a segmentation failure.
If someone has any idea, I would much appreciate it.
Thanks
Corey
The strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDITED HERE:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.
So I wrote the following code in linux(Ubuntu) using the emacs text editor it basically supposed to split the string on the delimeter passed in . When I ran it it segfaulted I ran it though GDB and it gives me an error at strcpy(which I don't invoke) but is probably done implicitly in sprintf. I didn't think I was doing anything wrong so I booted into windows and ran it through visual studio and it works fine I am new to writing C in Linux and know the problem is in the While loop where i call sprintf() (which is odd because the call outside of the loop writes without causing an error) to write the token to the array. If anyone can tell me where I am going wrong I would greatly appreciate it. Here is the code
/* split()
Description:
- takes a string and splits it into substrings "on" the
<delimeter>*/
void split(char *string, char *delimiter)
{
int i;
int count = 0;
char *token;
//large temporary buffer to over compensate for the fact that we have
//no idea how many arguments will be passed with a command
char *bigBuffer[25];
for(i = 0; i < 25; i++)
{
bigBuffer[i] = (char*)malloc(sizeof(char) * 50);
}
//get the first token and add it to <tokens>
token = strtok(string, delimiter);
sprintf(bigBuffer[0], "%s", token);
//while we have not encountered the end of the string keep
//splitting on the delimeter and adding to <bigBuffer>
while(token != NULL)
{
token = strtok(NULL, delimiter);
sprintf(bigBuffer[++count], "%s", token);
}
//for(i = 0; i < count; i++)
//printf("i = %d : %s\n", i, bigBuffer[i]);
for(i = 0; i< 25; i++)
{
free(bigBuffer[i]);
}
} //end split()
You aren't checking for NULL from the return of strtok on the last iteration of the loop ... so strtok can return NULL, yet you still pass the NULL value in the token pointer to sprintf.
Change your while-loop to the following:
while(token = strtok(NULL, delimiter)) sprintf(bigBuffer[++count], "%s", token);
That way you can never pass a NULL pointer to strtok because the while-loop NULL-pointer check will enforce that token always has a valid value when sprintf is called with it as an argument.
You should ask gdb for a full traceback of where your program crashed. The fact that you don't know precisely where it crashed means you didn't ask it for a full traceback, which is important.
I have a problem using strtok in C. I get a user input from the command line using fgets and I want to tokenize it with pipe ("|") as the delimeter and put the result in a double pointer variable. Here's my code:
char** argv;
char *token;
token = strtok(userInput, "|");
while(token != NULL){
*(argv++) = token;
token = strtok(NULL, "|");
}
*argv = '\0';
I then use this code to verify if it's well tokenized
while(*argv!= NULL)
{
if((strcmp(*argv, "|") == 0){
count = count + 1;
}
argv++;
}
printf("%d pipes", count);
But it doesn't work. char** argv contains nothing. The execution of the code stops and it returns -1. When I try to print argv, argv contains no values.
Any ideas please? Thanks.
Edit:
What i want to do is this
userInput = "abc|cde";
After using strtok. I want to have an **argv
**argv = "abc";
One problem is that you don't seem to be initializing argv. You need to allocate enough memory for it to hold as many char *s as are needed. Otherwise you're writing to some random block of memory. (Is it just that you haven't shown us the relevant code?)
Another problem is that you're actually modifying argv, so at the end of that loop, it's pointing one past the last token (and then you set *argv to NULL); but your verification code assumes that it's pointing to the first token, and starts by confirming that *argv is not NULL. (Is it just that you haven't shown us some relevant code?) Edited to add: I see from your comment above that "argv contains no values". I'm pretty confident that this is the reason why.
Incidentally, you're confusing '\0' (a null byte) with NULL (a null pointer). Technically this works out correctly — '\0' gets promoted to 0, 0 gets converted to NULL — but I find it a bit worrisome that you're confusing them, since conceptually they are quite different. You should write *argv = NULL rather than *argv = '\0', for clarity if nothing else.
your tokenizing code works like this :
if
userInput = "a|b|c"
then
argv = { "a", "b", "c" }
you might be expecting that
argv = {"a","|","b","|","c"}
Your code to count pipes should be :
while(*argv != NULL)
{
count = count + 1;
argv++;
}
printf("%d pipes", count-1);
I think it will work
what I am using is this format for searching 300x400. looking for the "x" to get rid of the x and use both sides, the 300 and 400 . this works for me.
char *tok1, *tok2, *saveptr;
tok1 = strtok_r(argv, "x", &saveptr);
tok2 = strtok_r(NULL, "x", &saveptr);
printf("this tok1 %s this is tok2 %s\n", tok1, tok2);
using strtok_r function
the problem is your argv does not point to first element when you try to get result from it.
problem occurs here:
*(argv++) = token
argv (a pointer to char*) is increased when you add token pointer to argv array (I assume you have initialized it correctly). So when you use the second part of code to get the result, argv already points to the last element, in your case '\0', which will produce no output.
And, you are mixing '\0' with NULL, although they are both right grammatically, but it using NULL is better in your case, because it means a pointer, but '\0' means null-terminate in a C-string
You can change the your code to the following:
/* Init argv array */
char** argv;
size_t argc=0; // token count
char *token;
token = strtok(userInput, "|");
while(token != NULL){
argv[argc++] = token;
token = strtok(NULL, "|");
}
argv[argc] = NULL; // the last element of argv array is a NULL pointer
/* get result from argv */
while(*argv!= NULL)
{
if((strcmp(*argv, "|") == 0){
count = count + 1;
}
argv++;
}
printf("%d pipes", count);
Please explain to me the working of strtok() function. The manual says it breaks the string into tokens. I am unable to understand from the manual what it actually does.
I added watches on str and *pch to check its working when the first while loop occurred, the contents of str were only "this". How did the output shown below printed on the screen?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
Output:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
the strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first
argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDIT:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.
strtok() divides the string into tokens. i.e. starting from any one of the delimiter to next one would be your one token. In your case, the starting token will be from "-" and end with next space " ". Then next token will start from " " and end with ",". Here you get "This" as output. Similarly the rest of the string gets split into tokens from space to space and finally ending the last token on "."
strtok maintains a static, internal reference pointing to the next available token in the string; if you pass it a NULL pointer, it will work from that internal reference.
This is the reason strtok isn't re-entrant; as soon as you pass it a new pointer, that old internal reference gets clobbered.
strtok doesn't change the parameter itself (str). It stores that pointer (in a local static variable). It can then change what that parameter points to in subsequent calls without having the parameter passed back. (And it can advance that pointer it has kept however it needs to perform its operations.)
From the POSIX strtok page:
This function uses static storage to keep track of the current string position between calls.
There is a thread-safe variant (strtok_r) that doesn't do this type of magic.
strtok will tokenize a string i.e. convert it into a series of substrings.
It does that by searching for delimiters that separate these tokens (or substrings). And you specify the delimiters. In your case, you want ' ' or ',' or '.' or '-' to be the delimiter.
The programming model to extract these tokens is that you hand strtok your main string and the set of delimiters. Then you call it repeatedly, and each time strtok will return the next token it finds. Till it reaches the end of the main string, when it returns a null. Another rule is that you pass the string in only the first time, and NULL for the subsequent times. This is a way to tell strtok if you are starting a new session of tokenizing with a new string, or you are retrieving tokens from a previous tokenizing session. Note that strtok remembers its state for the tokenizing session. And for this reason it is not reentrant or thread safe (you should be using strtok_r instead). Another thing to know is that it actually modifies the original string. It writes '\0' for teh delimiters that it finds.
One way to invoke strtok, succintly, is as follows:
char str[] = "this, is the string - I want to parse";
char delim[] = " ,-";
char* token;
for (token = strtok(str, delim); token; token = strtok(NULL, delim))
{
printf("token=%s\n", token);
}
Result:
this
is
the
string
I
want
to
parse
The first time you call it, you provide the string to tokenize to strtok. And then, to get the following tokens, you just give NULL to that function, as long as it returns a non NULL pointer.
The strtok function records the string you first provided when you call it. (Which is really dangerous for multi-thread applications)
strtok modifies its input string. It places null characters ('\0') in it so that it will return bits of the original string as tokens. In fact strtok does not allocate memory. You may understand it better if you draw the string as a sequence of boxes.
To understand how strtok() works, one first need to know what a static variable is. This link explains it quite well....
The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..
Have a look at my own strtok() implementation, called zStrtok(), which has a sligtly different functionality than the one provided by strtok()
char *zStrtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}
And here is an example usage
Example Usage
char str[] = "A,B,,,C";
printf("1 %s\n",zStrtok(s,","));
printf("2 %s\n",zStrtok(NULL,","));
printf("3 %s\n",zStrtok(NULL,","));
printf("4 %s\n",zStrtok(NULL,","));
printf("5 %s\n",zStrtok(NULL,","));
printf("6 %s\n",zStrtok(NULL,","));
Example Output
1 A
2 B
3 ,
4 ,
5 C
6 (null)
The code is from a string processing library I maintain on Github, called zString. Have a look at the code, or even contribute :)
https://github.com/fnoyanisi/zString
This is how i implemented strtok, Not that great but after working 2 hr on it finally got it worked. It does support multiple delimiters.
#include "stdafx.h"
#include <iostream>
using namespace std;
char* mystrtok(char str[],char filter[])
{
if(filter == NULL) {
return str;
}
static char *ptr = str;
static int flag = 0;
if(flag == 1) {
return NULL;
}
char* ptrReturn = ptr;
for(int j = 0; ptr != '\0'; j++) {
for(int i=0 ; filter[i] != '\0' ; i++) {
if(ptr[j] == '\0') {
flag = 1;
return ptrReturn;
}
if( ptr[j] == filter[i]) {
ptr[j] = '\0';
ptr+=j+1;
return ptrReturn;
}
}
}
return NULL;
}
int _tmain(int argc, _TCHAR* argv[])
{
char str[200] = "This,is my,string.test";
char *ppt = mystrtok(str,", .");
while(ppt != NULL ) {
cout<< ppt << endl;
ppt = mystrtok(NULL,", .");
}
return 0;
}
For those who are still having hard time understanding this strtok() function, take a look at this pythontutor example, it is a great tool to visualize your C (or C++, Python ...) code.
In case the link got broken, paste in:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "Hello, my name is? Matthew! Hey.";
char* p;
for (char *p = strtok(s," ,?!."); p != NULL; p = strtok(NULL, " ,?!.")) {
puts(p);
}
return 0;
}
Credits go to Anders K.
Here is my implementation which uses hash table for the delimiter, which means it O(n) instead of O(n^2) (here is a link to the code):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define DICT_LEN 256
int *create_delim_dict(char *delim)
{
int *d = (int*)malloc(sizeof(int)*DICT_LEN);
memset((void*)d, 0, sizeof(int)*DICT_LEN);
int i;
for(i=0; i< strlen(delim); i++) {
d[delim[i]] = 1;
}
return d;
}
char *my_strtok(char *str, char *delim)
{
static char *last, *to_free;
int *deli_dict = create_delim_dict(delim);
if(!deli_dict) {
/*this check if we allocate and fail the second time with entering this function */
if(to_free) {
free(to_free);
}
return NULL;
}
if(str) {
last = (char*)malloc(strlen(str)+1);
if(!last) {
free(deli_dict);
return NULL;
}
to_free = last;
strcpy(last, str);
}
while(deli_dict[*last] && *last != '\0') {
last++;
}
str = last;
if(*last == '\0') {
free(deli_dict);
free(to_free);
deli_dict = NULL;
to_free = NULL;
return NULL;
}
while (*last != '\0' && !deli_dict[*last]) {
last++;
}
*last = '\0';
last++;
free(deli_dict);
return str;
}
int main()
{
char * str = "- This, a sample string.";
char *del = " ,.-";
char *s = my_strtok(str, del);
while(s) {
printf("%s\n", s);
s = my_strtok(NULL, del);
}
return 0;
}
strtok() stores the pointer in static variable where did you last time left off , so on its 2nd call , when we pass the null , strtok() gets the pointer from the static variable .
If you provide the same string name , it again starts from beginning.
Moreover strtok() is destructive i.e. it make changes to the orignal string. so make sure you always have a copy of orignal one.
One more problem of using strtok() is that as it stores the address in static variables , in multithreaded programming calling strtok() more than once will cause an error. For this use strtok_r().
strtok replaces the characters in the second argument with a NULL and a NULL character is also the end of a string.
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
you can scan the char array looking for the token if you found it just print new line else print the char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
int len = strlen(s);
char delim =' ';
for(int i = 0; i < len; i++) {
if(s[i] == delim) {
printf("\n");
}
else {
printf("%c", s[i]);
}
}
free(s);
return 0;
}
So, this is a code snippet to help better understand this topic.
Printing Tokens
Task: Given a sentence, s, print each word of the sentence in a new line.
char *s;
s = malloc(1024 * sizeof(char));
scanf("%[^\n]", s);
s = realloc(s, strlen(s) + 1);
//logic to print the tokens of the sentence.
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
printf("%s\n",p);
}
Input: How is that
Result:
How
is
that
Explanation: So here, "strtok()" function is used and it's iterated using for loop to print the tokens in separate lines.
The function will take parameters as 'string' and 'break-point' and break the string at those break-points and form tokens. Now, those tokens are stored in 'p' and are used further for printing.
strtok is replacing delimiter with'\0' NULL character in given string
CODE
#include<iostream>
#include<cstring>
int main()
{
char s[]="30/4/2021";
std::cout<<(void*)s<<"\n"; // 0x70fdf0
char *p1=(char*)0x70fdf0;
std::cout<<p1<<"\n";
char *p2=strtok(s,"/");
std::cout<<(void*)p2<<"\n";
std::cout<<p2<<"\n";
char *p3=(char*)0x70fdf0;
std::cout<<p3<<"\n";
for(int i=0;i<=9;i++)
{
std::cout<<*p1;
p1++;
}
}
OUTPUT
0x70fdf0 // 1. address of string s
30/4/2021 // 2. print string s through ptr p1
0x70fdf0 // 3. this address is return by strtok to ptr p2
30 // 4. print string which pointed by p2
30 // 5. again assign address of string s to ptr p3 try to print string
30 4/2021 // 6. print characters of string s one by one using loop
Before tokenizing the string
I assigned address of string s to some ptr(p1) and try to print string through that ptr and whole string is printed.
after tokenized
strtok return the address of string s to ptr(p2) but when I try to print string through ptr it only print "30" it did not print whole string. so it's sure that strtok is not just returning adress but it is placing '\0' character where delimiter is present.
cross check
1.
again I assign the address of string s to some ptr (p3) and try to print string it prints "30" as while tokenizing the string is updated with '\0' at delimiter.
2.
see printing string s character by character via loop the 1st delimiter is replaced by '\0' so it is printing blank space rather than ''