I'm using strtok() to parse a line into individual words and check them for something. If that thing is found, another function is called which must also use strtok().
I know strtok() is not re-entrant. Does that mean that if I call it in the second function, my position in the string in the first function will be lost? If so, would using strtok() in the first function and strtok_r() in the second solve the problem? Is there another solution?
edit:
thanks. it is indeed not possible to use strtok in two functions but apparently strtok_r is not standard. redesign it is...
Since strtok internally uses a global variable to store how far it had advanced in the string, intermingling calls to strtok will fail, just like you suspect. Your options are:
switch to strtok_r, which has a similar API, but is not standard C (it is in POSIX, though);
avoid strtok altogether in favor of some other function that doesn't carry hidden global state, such as strsep (also non-standard);
make sure your first function fully exhausts strtok before calling another function that can call strtok.
All in all, strtok is a function best avoided.
The library function strtok uses an internal static state for the current parsing position:
when called with strings, it starts a new parse,
when called with NULL as the first argument, it uses its internal state.
If you directly or indirectly call strtok from your parse loop, the internal state will be updated and the call with NULL from the outer scope will not continue from the previous state, possibly invoking undefined behavior.
Posix function strtok_r takes an explicit state argument, so it can be used in nested contexts. If this function is available on your system, use it in all places where you use strtok. Alternatiely, you could a different method with strchr() or strcspn().
strtok_r is standardized in Posix. Depending on your target system, it may or may not be available. MacOS and most Unix systems are Posix compliant. Windows might have it under a different name. If it is not available, you can redefine it in your program and conditionally compile it.
Here is a simple implementation you ca use:
char *strtok_r(char *s, const char *delim, char **context) {
char *token = NULL;
if (s == NULL)
s = *context;
/* skip initial delimiters */
s += strspn(s, delim);
if (*s != '\0') {
/* we have a token */
token = s;
/* skip the token */
s += strcspn(s, delim);
if (*s != '\0') {
/* cut the string to terminate the token */
*s++ = '\0';
}
}
*context = s;
return token;
}
Q1 :
does that mean that if i call it in the second function my position in the string in the first function will be lost?
A1 : Yes, it does. If you do that, then a new scanning sequence will be started with another string that you provided, and data for subsequent calls for your first string shall be lost.
Q2 :
If so, would using strtok() in the first function and strtok_r() in the second solve the problem?
A2 : The better approach would be to rework your program design.
Q3 :
Is there another solution?
A3 : If the cost of design changes is too high, I would suggest keeping a copy of first string and a pointer that contains last found token. This can allow you to continue from last position (by getting a start pointer with strstr for example) after you are done with your second string.
As indicated in http://en.cppreference.com/w/c/string/byte/strtok website:
Each call to strtok modifies a static variable: is not thread safe.
So yes you can't call this function from two different functions at the same time (threading) nor you can't call it like the following:
char input[] = "something that needs to be tokenize";
char *token = strtok(input, " ");
while(token) {
puts(token);
anotherfunction();
token = strtok(NULL, " ");
}
void anotherfunction()
{
char input[] = "another string needs to be tokenize";
char *tok = strtok(input, " ");
while(tok) {
puts(tok);
tok = strtok(NULL, " ");
}
}
i Have the following code
char inputs []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1";
char parameters[18];
strcpy(parameters,strtok(inputs,"/"));
and then some code to transmit my characters through uart and see them at a monitor. when i transmit the inputs i see them fine but when i transmit the parameters i see nothing the string is empty.
i have seen examples for strtok and it uses that code to split strings. I have also tried this kind of code at visual studio and when i print them it shows me the strings fine. Is there any chance that strtok doesn't function well with a microprocessor????
While working with microcontrollers, you have to take care from which memory area you are working on. On software running on a PC, everything is stored and run from the RAM. But, on a flash microcontroller, code is run from flash (also called program memory) while data are processed from RAM (also called data memory).
In the case you are working on, the inputs variable is storing an hardcoded character array, which can be const, and we don't know in which area the compiler chose to put it. So, we could rewrite you small program just to make sure that all the data are stored in program data and we will use the "_P" functions to manipulate this data.
#include <avr/pgmspace.h > // to play in program space
const char inputs PROGMEM []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1"; // Now, we are sure this is in program memory space
char buffer[200]; // should be long enough to contain a copy of inputs
char parameters[18];
int length = strlen_P(inputs); // Should contains string length of 186 (just to debug)
strcpy_P(buffer,inputs); // Copy the PROGMEM data in data memory space
strcpy(parameters,strtok_P(buffer,"/")); // Parse the data now in data memory space
For more info on program space with avr gcc : http://www.nongnu.org/avr-libc/user-manual/pgmspace.html
I don't use strtok much, if at all, but this appears to be the correct way to store the result of strtok() in an char[]:
const int NUM_PARAMS = 18;
const int MAX_CHARS = 64;
char parameters[NUM_PARAMS][MAX_CHARS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
or this if you don't want to allocate unnecessary memory for smaller tokens:
const int NUM_PARAMS = 18;
char* parameters[NUM_PARAMS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
parameters[count] = malloc(strlen(result) + 1);
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
parameters should now contain all of your tokens.
strtok() intentionally modifies the source string, replacing tokens with terminators as it goes. There is no reason to store the content in auxiliary storage whatsoever so long as the source string is mutable.
Splitting the string into conjoined constants (which will be assembled by the compiler, so they ultimately will be a single terminated string):
char inputs []="3,0,23.30,3,30/"
"55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/"
"1.5,0.5,0.2,0.2,0.3,0.1";
Clearly the second and third sequences delimited by '/' are nowhere near 17 chars wide (remember, your copy needs one more place for the terminator, thus only 17 chars can legally be copied).
Unless there is a compelling reason to do otherwise, I see no reason you can't simply do this:
char *param = strtok(inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
What you do with // TODO do something with param... is up to you. The length of the parameter can be retrieved by strlen(param), for example. You can make a copy, providing you have enough storage space as the destination (in the second and third cases, you don't provide enough storage with only an 18-char buffer in your example).
Regardless, remember, strtok() modifies the source inputs[] array. If that is not acceptable an alternative would be something like:
char *tmp_inputs = strdup(inputs);
if (tmp_inputs != NULL)
{
char *param = strtok(tmp_inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
// done with copy of inputs. free it.
free(tmp_inputs);
}
There are considerably threading decisions to make as well, the very reason the version of strtok() that requires the caller (you) to tote around context between calls was invented. See strtok_r() for more information.
You have to initialialize parameters that way :
char parameters[18] = {'\0'};
Thus, it will be initialized with null characters instead of variable values. This is important, because strcpy will try to find a null character to identify the end of the string.
Just to make it clear...
parameters after its initialisation : [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
parameters after strtok : ['3',',','0',',','2','3','.','3','0',',','3',',','3','0',0,0,0,0]
i have tried using strtok function but i dont know how to use it
this is the code i read from the net
FILE *ptr = fopen("testdoc.txt", "r");
char nums[100];
fgets(nums,100,ptr);
const char s[2] = ",";
char *token;
token =strtok (nums, s);
while( token != NULL )
{
printf( " %s\n", token );
token = strtok(NULL, s);
}
why do we have token = strtok(NULL,s) in the last line?? and how do i store the numbers obtained by token into an array??
thanks alot, please explain in detail
From strtok reference
On a first call, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of last token as the new starting location for scanning.
That is strtok stores the position internally.
It's pretty simple to get the numbers of obtained tokens. There are no miracles. Just use counter and increment it in a loop.
strtok changes its first argument (contents of char*/char[]). When it finds first seperator(second argument) from char array, the seperator in the array is changed to '\0', and a char* is returned. After this, when you want to get second segment, you should use NULL as first argument (strtok has already hold the array, don't drop them), and strtok finds next seperator, change it to \0, and return this segment by char*(to first char of this seg).
To second question, change char* to int:
int i = atoi(strtok(...));
I am using strtok to extract 2 words from a string names[result]. I want to get the first value from the strtok and stored it into a char array named lastName and the second value into a char array named firstName. However I got an invalid initializer error for 2 lines which are indicated by the arrow when I compiled my code. How do I resolve my problem?
char *p = NULL;
p = strtok(names[result]," ");
char lastName[50] = p; <---
p = strtok(NULL, " ");
char firstName[50] = p; <---
printf("%s %s\n",firstName,lastName);
strtok gives the pointer to the tokenized string.
char lastName[50] = p; Isn't really a good thing that you are doing there. Should use strncpy() to copy the string, or if only want the pointer, then should store in another pointer.
Array initialization in C can only use a literal, not a variable. So your code is a syntax error.
You need to use the typical strcpy() function to copy the string, or some of the more safe (and modern) varities, like strlcpy() or snprintf().
You could also do the parsing and copying in one call, using sscanf(), with proper size specifiers in the formatting string to avoid the risk of buffer overflow.
You can initialize a string to the character array like char lastName[50] = "Sample";
In this case you are trying to initialize a pointer to the character array 'char lastName[50] = p;' which is not valid.
Better you can use strcpy, memcpy function to copy the string to the character array or you can assign it in another pointer.
The other answers are all correct in that copying the string data out will make this program work, but the reason strtok is so dastardly (and generally using it is considered ill-advised) is that it changes your input by inserting NULLs into the original string. If you're going to be using it anyway, you might as well advantage of this and just use the pointers that strtok is returning directly.
Of note, though, is that since the input is changed and maybe whoever passed that input into you is not expecting that, it might be better to copy the input to a separate string first before ever calling strtok on it.
Observe the output of this code to see what I mean:
int main(int argc, char *argv[]) {
char name[] = "Firstname Lastname";
printf("Name before strtok: %s\n", name);
char *first = strtok(name, " ");
char *last = strtok(NULL, " ");
printf("Token strings: first=%s last=%s\n", first, last);
printf("Name after strtok: %s\n", name);
}
Produces:
Firstname Name before strtok: Firstname Lastname
Token strings: first=Firstname last=Firstname
Name after strtok: Firstname
I'm writing a function that gets the path environment variable of a system, splits up each path, then concats on some other extra characters onto the end of each path.
Everything works fine until I use the strcat() function (see code below).
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = (char *)malloc(strlen(path) + 1);
char* token[80];
int j, i=0; // used to iterate through array
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":"); //get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
for(j = 0; j <= i-1; j++) {
strcat(token[j], "/");
//strcat(token[j], exeName);
printf("%s\n", token[j]); //print out all of the tokens
}
}
My shell output is like this (I'm concatenating "/which" onto everything):
...
/usr/local/applic/Maple/bin/which
which/which
/usr/local/applic/opnet/8.1.A.wdmguru/sys/unix/bin/which
which/which
Bus error (core dumped)
I'm wondering why strcat is displaying a new line and then repeating which/which.
I'm also wondering about the Bus error (core dumped) at the end.
Has anyone seen this before when using strcat()?
And if so, anyone know how to fix it?
Thanks
strtok() does not give you a new string.
It mutilates the input string by inserting the char '\0' where the split character was.
So your use of strcat(token[j],"/") will put the '/' character where the '\0' was.
Also the last token will start appending 'which' past the end of your allocated memory into uncharted memory.
You can use strtok() to split a string into chunks. But if you want to append anything onto a token you need to make a copy of the token otherwise what your appending will spill over onto the next token.
Also you need to take more care with your memory allocation you are leaking memory all over the place :-)
PS. If you must use C-Strings. use strdup() to copy the string.
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = strdup(path);
char* token[80];
int j, i; // used to iterate through array
token[0] = strtok(pathDeepCopy, ":");
for(i = 0;(token[i] != NULL) && (i < 80);++i)
{
token[i] = strtok(NULL, ":");
}
for(j = 0; j <= i; ++j)
{
char* tmp = (char*)malloc(strlen(token[j]) + 1 + strlen(exeName) + 1);
strcpy(tmp,token[j]);
strcat(tmp,"/");
strcat(tmp,exeName);
printf("%s\n",tmp); //print out all of the tokens
free(tmp);
}
free(pathDeepCopy);
}
strtok does not duplicate the token but instead just points to it within the string. So when you cat '/' onto the end of a token, you're writing a '\0' either over the start of the next token, or past the end of the buffer.
Also note that even if strtok did returning copies of the tokens instead of the originals (which it doesn't), it wouldn't allocate the additional space for you to append characters so it'd still be a buffer overrun bug.
strtok() tokenizes in place. When you start appending characters to the tokens, you're overwriting the next token's data.
Also, in general it's not safe to simply concatenate to an existing string unless you know that the size of the buffer the string is in is large enough to hold the resulting string. This is a major cause of bugs in C programs (including the dreaded buffer overflow security bugs).
So even if strtok() returned brand-new strings unrelated to your original string (which it doesn't), you'd still be overrunning the string buffers when you concatenated to them.
Some safer alternatives to strcpy()/strcat() that you might want to look into (you may need to track down implementations for some of these - they're not all standard):
strncpy() - includes the target buffer size to avoid overruns. Has the drawback of not always terminating the result string
strncat()
strlcpy() - similar to strncpy(), but intended to be simpler to use and more robust (http://en.wikipedia.org/wiki/Strlcat)
strlcat()
strcpy_s() - Microsoft variants of these functions
strncat_s()
And the API you should strive to use if you can use C++: the std::string class. If you use the C++ std::string class, you pretty much do not have to worry about the buffer containing the string - the class manages all of that for you.
OK, first of all, be careful. You are losing memory.
Strtok() returns a pointer to the next token and you are storing it in an array of chars.
Instead of char token[80] it should be char *token.
Be careful also when using strtok. strtok practically destroys the char array called pathDeepCopy because it will replace every occurrence of ":" with '\0'.As Mike F told you above.
Be sure to initialize pathDeppCopy using memset of calloc.
So when you are coding token[i] there is no way of knowing what is being point at.
And as token has no data valid in it, it is likely to throw a core dump because you are trying to concat. a string to another that has no valida data (token).
Perphaps th thing you are looking for is and array of pointers to char in which to store all the pointer to the token that strtok is returnin in which case, token will be like char *token[];
Hope this helps a bit.
If you're using C++, consider boost::tokenizer as discussed over here.
If you're stuck in C, consider using strtok_r because it's re-entrant and thread-safe. Not that you need it in this specific case, but it's a good habit to establish.
Oh, and use strdup to create your duplicate string in one step.
replace that with
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":");//get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
// use new array for storing the new tokens
// pardon my C lang skills. IT's been a "while" since I wrote device drivers in C.
const int I = i;
const int MAX_SIZE = MAX_PATH;
char ** newTokens = new char [MAX_PATH][I];
for (int k = 0; k < i; ++k) {
sprintf(newTokens[k], "%s%c", token[j], '/');
printf("%s\n", newtoken[j]); //print out all of the tokens
}
this will replace overwriting the contents and prevent the core dump.
and don't forget to check if malloc returns NULL!