I'm using strtok() to parse a line into individual words and check them for something. If that thing is found, another function is called which must also use strtok().
I know strtok() is not re-entrant. Does that mean that if I call it in the second function, my position in the string in the first function will be lost? If so, would using strtok() in the first function and strtok_r() in the second solve the problem? Is there another solution?
edit:
thanks. it is indeed not possible to use strtok in two functions but apparently strtok_r is not standard. redesign it is...
Since strtok internally uses a global variable to store how far it had advanced in the string, intermingling calls to strtok will fail, just like you suspect. Your options are:
switch to strtok_r, which has a similar API, but is not standard C (it is in POSIX, though);
avoid strtok altogether in favor of some other function that doesn't carry hidden global state, such as strsep (also non-standard);
make sure your first function fully exhausts strtok before calling another function that can call strtok.
All in all, strtok is a function best avoided.
The library function strtok uses an internal static state for the current parsing position:
when called with strings, it starts a new parse,
when called with NULL as the first argument, it uses its internal state.
If you directly or indirectly call strtok from your parse loop, the internal state will be updated and the call with NULL from the outer scope will not continue from the previous state, possibly invoking undefined behavior.
Posix function strtok_r takes an explicit state argument, so it can be used in nested contexts. If this function is available on your system, use it in all places where you use strtok. Alternatiely, you could a different method with strchr() or strcspn().
strtok_r is standardized in Posix. Depending on your target system, it may or may not be available. MacOS and most Unix systems are Posix compliant. Windows might have it under a different name. If it is not available, you can redefine it in your program and conditionally compile it.
Here is a simple implementation you ca use:
char *strtok_r(char *s, const char *delim, char **context) {
char *token = NULL;
if (s == NULL)
s = *context;
/* skip initial delimiters */
s += strspn(s, delim);
if (*s != '\0') {
/* we have a token */
token = s;
/* skip the token */
s += strcspn(s, delim);
if (*s != '\0') {
/* cut the string to terminate the token */
*s++ = '\0';
}
}
*context = s;
return token;
}
Q1 :
does that mean that if i call it in the second function my position in the string in the first function will be lost?
A1 : Yes, it does. If you do that, then a new scanning sequence will be started with another string that you provided, and data for subsequent calls for your first string shall be lost.
Q2 :
If so, would using strtok() in the first function and strtok_r() in the second solve the problem?
A2 : The better approach would be to rework your program design.
Q3 :
Is there another solution?
A3 : If the cost of design changes is too high, I would suggest keeping a copy of first string and a pointer that contains last found token. This can allow you to continue from last position (by getting a start pointer with strstr for example) after you are done with your second string.
As indicated in http://en.cppreference.com/w/c/string/byte/strtok website:
Each call to strtok modifies a static variable: is not thread safe.
So yes you can't call this function from two different functions at the same time (threading) nor you can't call it like the following:
char input[] = "something that needs to be tokenize";
char *token = strtok(input, " ");
while(token) {
puts(token);
anotherfunction();
token = strtok(NULL, " ");
}
void anotherfunction()
{
char input[] = "another string needs to be tokenize";
char *tok = strtok(input, " ");
while(tok) {
puts(tok);
tok = strtok(NULL, " ");
}
}
Related
This question already has answers here:
Nested strtok function problem in C [duplicate]
(2 answers)
Closed 6 years ago.
I realize the title is confusing, couldn't think of a clearer way to word it. Basically, I am calling a strtok loop inside a strtok loop, but when the inner strtok function returns from runCommand, my first strtok loop stops. It simply exits the loop, even when there are other arguments following the first semicolon. When I don't call runCommand(), it works as expected, and parses through all my commands separated by semicolon.
The goal of this code is to parse a line of commands separated by semicolons, then parse command and command arguments to enter into execvp later. This is the only part I am having trouble with. Here it is:
void parseCommand(char *userLine)
{
if(strchr(userLine, ';'))
{
// Get first token
token = strtok(userLine, ";");
// Loop through all tokens
while(token != NULL)
{
// Make a copy
char *copy = malloc(strlen(token) + 1);
strcpy(copy, token);
runCommand(copy);
free(copy);
printf("process returned!\n");
token = strtok(NULL, ";");
}
}
}
void runCommand(char *token)
{
char *args[20];
char **command = args;
//Tokenize each command based on space
char *temp = strtok(token, " \n");
while (temp != NULL)
{
*command++ = temp;
temp = strtok(NULL, " \n");
}
*command = NULL;
// code for fork and execvp here
}
Can someone explain why runCommand is screwing up my first function's parsing? I REALLY don't understand why it's not working with a copy of my original token. Probably simple, but I've looked at it too long?
The function strtok is not reentrant. It remembers its current state, which is why you pass NULL for repeated calls without a segfault.
Consider using strtok_s or strtok_r (depending on implementation) which allows the caller to save the state. These can be used in a nested fashion.
strtok doesn't know about the context in which it's executing, it behaves somewhat globally.
Try using strtok_r, which allows you to specify a context so that multiple separate uses won't interfere with each other.
From the man page:
Different strings may be parsed concurrently using sequences of calls to strtok_r() that specify different saveptr arguments.
I am learning string manipulation with C Standard Functions. When I am learning this stuff, I am facing with strtok function and the following code.
#include <string.h>
#include <stdio.h>
int main()
{
char str[80] = "This is - www.tutorialspoint.com - website";
const char s[2] = "-";
char *token;
/* get the first token */
token = strtok(str, s);
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok(NULL, s);
}
return(0);
}
I don't understand why in while loop, strtok used with null? Why null used here? Because in strtok function definition comes something like (this function breaks first paramter string into a series of tokens using the second paramter of itself.)
Because it uses an internal static pointer to the string you are working with, so if you want it to operate on the same string, you just need to call it with NULL as the first argument and let it use it's internal pointer. If you call it with non-null first argument then it will overwrite the pointer with the new pointer.
This means in turn, that strtok() is not reentrant. So you normally just use it in simple situations, more complex situations where reentrance is important (like multithreaded programs, or working on multiple strings) require different approaches.
One way is on POSIX systems where you can use strtok_r() which takes one extra argument to use as it's "internal" pointer.
Check this manual to learn about it more.
strtok uses an internal (static) state to tokenize a string. When called with NULL, it goes to the next token in the string that was passed in the first call.
It is worth mentioning, that this property (internal state) makes it unsafe to use in multi-threaded environment. A safer version is strtok_r, which return the state as an output parameter.
The first call you use a char array which has the elements you want parsed.
The second time you call it you pass it NULL as the first parameter to tell function to resume from the last spot in the string. Once the first call is made your char array receives the parsed string. If you don't put NULL you would lose your place and effectively the last part of your string.
char * c_Ptr = NULL; //temp hold indivisual sections
char c_Ptr1[1000] = {NULL};
fgets(c_Ptr1, 1000, f_ptr); //Grabs a line from an open file
strtok(c_Ptr1, ","); //first one starts at the beginning
c_Ptr = strtok(NULL, ",");
Just for the fun of it I am writing a program that will take a user inputted string (or maybe even a text document) and scramble the words within the string.
I am attempting to use the strtok function to separate each word in the string. At the moment I feel like my current implementation of strtok is sloppy:
int main(int argc, char *argv[])
{
char *string, *word;
if(!(string = getstr())) //function I wrote to retrieve a string
{
fputs("Error.\n", stderr);
exit(1);
}
char array[strlen(string) + 1]; //declare an array sized to the length of the string
strcpy(array, string); //copy the string into the array
free(string);
if(word = strtok(array, " "))
{
//later I'll just write each word into a matrix, not important right now.
while(word = strtok(NULL, " "))
{
//later I'll just write each word into a matrix, not important right now.
}
}
return 0;
}
I feel like there must be a cleaner way of implementing strtok without declaring an array midway through the program. It just doesn't feel correct to me. Is using strtok the correct way to go about this? I would rather not use a fixed size array, as I like everything to be dynamic, which is why I'm starting to doubt using strtok is the correct way to go.
If your string is malloced as suggested by your free. Then you don't need to copy it in a new buffer (which is btw 1 character too short). Use the buffer you were provided.
You only need to duplicate it if it was given to you by a const char * i.e. you're not allowed to modify the content of the buffer.
It's also better to use strtok_r as the regular strtokis not reentrant.
You can use scanf() instead of getstr() and strtok()
char word[100];
while(scanf(" %s",word)!=EOF) {
// use the word string here
}
the user should stop input chracters with
EOF = CTRL + D (for Linux)
EOF = CTRL + Z (for Windows)
My application produces strings like the one below. I need to parse values between the separator into individual values.
2342|2sd45|dswer|2342||5523|||3654|Pswt
I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.
token = (char *)strtok(strAccInfo, "|");
for (iLoop=1;iLoop<=106;iLoop++) {
token = (char *)strtok(NULL, "|");
}
Any suggestions?
In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).
It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)
That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.
On a first call, the function expects
a C string as argument for str, whose
first character is used as the
starting location to scan for tokens.
In subsequent calls, the function
expects a null pointer and uses the
position right after the end of last
token as the new starting location for
scanning.
To determine the beginning and the end
of a token, the function first scans
from the starting location for the
first character not contained in
delimiters (which becomes the
beginning of the token). And then
scans starting from this beginning of
the token for the first character
contained in delimiters, which becomes
the end of the token.
What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.
Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.
char *mystrtok(char **m,char *s,char c)
{
char *p=s?s:*m;
if( !*p )
return 0;
*m=strchr(p,c);
if( *m )
*(*m)++=0;
else
*m=p+strlen(p);
return p;
}
reentrant
threadsafe
strictly ANSI conform
needs an unused help-pointer from calling
context
e.g.
char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
puts(t);
e.g.
char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
char *p1,*t1;
for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
puts(t1);
}
your work :)
implement char *c as parameter 3
Look into using strsep instead: strsep reference
Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:
// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) {
static char *current; // just as ugly as strtok!
char *pos, *ret;
if (input != NULL)
current = input;
if (current == NULL)
return current;
ret = current;
pos = strpbrk(current, delim);
if (pos == NULL)
current = NULL;
else {
*pos = '\0';
current = pos+1;
}
return ret;
}
Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '\0';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
you can use it like
char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
This suppose to output
1
2
3
5
I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it
Below is the solution that is working for me now. Thanks to all of you who responded.
I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.
char strAccInfo[1024], *p2;
int iLoop;
Action() { //This value would come from the wrsp call in the actual script.
lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");
//Store the parameter into a string - saves memory.
strcpy(strAccInfo,lr_eval_string("{test_Param}"));
//Get the first instance of the separator "|" in the string
p2 = (char *) strchr(strAccInfo,'|');
//Start a loop - Set the max loop value to more than max expected.
for (iLoop = 1;iLoop<200;iLoop++) {
//Save parameter names in sequence.
lr_param_sprintf("Param_Name","Parameter_%d",iLoop);
//Get the first instance of the separator "|" in the string (within the loop).
p2 = (char *) strchr(strAccInfo,'|');
//Save the value for the parameters in sequence.
lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));
//Save string after the first instance of p2, as strAccInfo - for looping.
strcpy(strAccInfo,p2+1);
//Start conditional loop for checking for last value in the string.
if (strchr(strAccInfo,'|')==NULL) {
lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
iLoop = 200;
}
}
}
I'm writing a function that gets the path environment variable of a system, splits up each path, then concats on some other extra characters onto the end of each path.
Everything works fine until I use the strcat() function (see code below).
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = (char *)malloc(strlen(path) + 1);
char* token[80];
int j, i=0; // used to iterate through array
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":"); //get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
for(j = 0; j <= i-1; j++) {
strcat(token[j], "/");
//strcat(token[j], exeName);
printf("%s\n", token[j]); //print out all of the tokens
}
}
My shell output is like this (I'm concatenating "/which" onto everything):
...
/usr/local/applic/Maple/bin/which
which/which
/usr/local/applic/opnet/8.1.A.wdmguru/sys/unix/bin/which
which/which
Bus error (core dumped)
I'm wondering why strcat is displaying a new line and then repeating which/which.
I'm also wondering about the Bus error (core dumped) at the end.
Has anyone seen this before when using strcat()?
And if so, anyone know how to fix it?
Thanks
strtok() does not give you a new string.
It mutilates the input string by inserting the char '\0' where the split character was.
So your use of strcat(token[j],"/") will put the '/' character where the '\0' was.
Also the last token will start appending 'which' past the end of your allocated memory into uncharted memory.
You can use strtok() to split a string into chunks. But if you want to append anything onto a token you need to make a copy of the token otherwise what your appending will spill over onto the next token.
Also you need to take more care with your memory allocation you are leaking memory all over the place :-)
PS. If you must use C-Strings. use strdup() to copy the string.
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = strdup(path);
char* token[80];
int j, i; // used to iterate through array
token[0] = strtok(pathDeepCopy, ":");
for(i = 0;(token[i] != NULL) && (i < 80);++i)
{
token[i] = strtok(NULL, ":");
}
for(j = 0; j <= i; ++j)
{
char* tmp = (char*)malloc(strlen(token[j]) + 1 + strlen(exeName) + 1);
strcpy(tmp,token[j]);
strcat(tmp,"/");
strcat(tmp,exeName);
printf("%s\n",tmp); //print out all of the tokens
free(tmp);
}
free(pathDeepCopy);
}
strtok does not duplicate the token but instead just points to it within the string. So when you cat '/' onto the end of a token, you're writing a '\0' either over the start of the next token, or past the end of the buffer.
Also note that even if strtok did returning copies of the tokens instead of the originals (which it doesn't), it wouldn't allocate the additional space for you to append characters so it'd still be a buffer overrun bug.
strtok() tokenizes in place. When you start appending characters to the tokens, you're overwriting the next token's data.
Also, in general it's not safe to simply concatenate to an existing string unless you know that the size of the buffer the string is in is large enough to hold the resulting string. This is a major cause of bugs in C programs (including the dreaded buffer overflow security bugs).
So even if strtok() returned brand-new strings unrelated to your original string (which it doesn't), you'd still be overrunning the string buffers when you concatenated to them.
Some safer alternatives to strcpy()/strcat() that you might want to look into (you may need to track down implementations for some of these - they're not all standard):
strncpy() - includes the target buffer size to avoid overruns. Has the drawback of not always terminating the result string
strncat()
strlcpy() - similar to strncpy(), but intended to be simpler to use and more robust (http://en.wikipedia.org/wiki/Strlcat)
strlcat()
strcpy_s() - Microsoft variants of these functions
strncat_s()
And the API you should strive to use if you can use C++: the std::string class. If you use the C++ std::string class, you pretty much do not have to worry about the buffer containing the string - the class manages all of that for you.
OK, first of all, be careful. You are losing memory.
Strtok() returns a pointer to the next token and you are storing it in an array of chars.
Instead of char token[80] it should be char *token.
Be careful also when using strtok. strtok practically destroys the char array called pathDeepCopy because it will replace every occurrence of ":" with '\0'.As Mike F told you above.
Be sure to initialize pathDeppCopy using memset of calloc.
So when you are coding token[i] there is no way of knowing what is being point at.
And as token has no data valid in it, it is likely to throw a core dump because you are trying to concat. a string to another that has no valida data (token).
Perphaps th thing you are looking for is and array of pointers to char in which to store all the pointer to the token that strtok is returnin in which case, token will be like char *token[];
Hope this helps a bit.
If you're using C++, consider boost::tokenizer as discussed over here.
If you're stuck in C, consider using strtok_r because it's re-entrant and thread-safe. Not that you need it in this specific case, but it's a good habit to establish.
Oh, and use strdup to create your duplicate string in one step.
replace that with
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":");//get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
// use new array for storing the new tokens
// pardon my C lang skills. IT's been a "while" since I wrote device drivers in C.
const int I = i;
const int MAX_SIZE = MAX_PATH;
char ** newTokens = new char [MAX_PATH][I];
for (int k = 0; k < i; ++k) {
sprintf(newTokens[k], "%s%c", token[j], '/');
printf("%s\n", newtoken[j]); //print out all of the tokens
}
this will replace overwriting the contents and prevent the core dump.
and don't forget to check if malloc returns NULL!