Let's say I'm using strtok() like this..
char *token = strtok(input, ";-/");
Is there a way to figure out which token actually gets used? For instance, if the inputs was something like:
Hello there; How are you? / I'm good - End
Can I figure out which delimiter was used for each token? I need to be able to output a specific message, depending on the delimiter that followed the token.
Important: strtok is not re-entrant, you should use strtok_r instead of it.
You can do it by saving a copy of the original string, and looking into offsets of the current token into that copy:
char str[] = "Hello there; How are you? / I'm good - End";
char *copy = strdup(str);
char *delim = ";-/";
char *res = strtok( str, delim );
while (res) {
printf("%c\n", copy[res-str+strlen(res)]);
res = strtok( NULL, delim );
}
free(copy);
This prints
;
/
-
Demo #1
EDIT: Handling multiple delimiters
If you need to handle multiple delimiters, determining the length of the current sequence of delimiters becomes slightly harder: now you need to find the next token before deciding how long is the sequence of delimiters. The math is not complicated, as long as you remember that NULL requires special treatment:
char str[] = "(20*(5+(7*2)))+((2+8)*(3+6*9))";
char *copy = strdup(str);
char *delim = "*+()";
char *res = strtok( str, delim );
while (res) {
int from = res-str+strlen(res);
res = strtok( NULL, delim );
int to = res != NULL ? res-str : strlen(copy);
printf("%.*s\n", to-from, copy+from);
}
free(copy);
Demo #2
You can't. strtok overwrites the next separator character with a nul character (in order to terminate the token that it's returning this time), and it doesn't store the previous value that it overwrites. The first time you call strtok on your example string, the ; is gone forever.
You could do something if you keep an unmodified copy of the string you're modifying with strtok - given the index of the nul terminator for your current token (relative to the start of the string), you can look at the same index in the copy and see what was there.
That might be worse than just writing your own code to separate the string, of course. You can use strpbrk or strcspn, if you can live with the resulting token not being nul-terminated for you.
man 3 strtok
The strtok() and strtok_r() functions return a pointer to the
beginning of each subsequent token in the string, after replacing the
token itself with a NUL character. When no
more tokens remain, a null pointer is returned.
But with a little pointer arithmetic you can do something like:
char* string = "Hello,World!";
char* dup = strdup(string);
char* world = strtok(string, ",");
char delim_used = dup[world - string];
free(dup);
Related
As per this description, strtok() delimitate a string into tokens by the delimiter given, returns a pointer to the first token found in the string. All subsequent tokens need to be traversed via a loop, like the example code given in the link.
Does each token auto terminate with NULL? i.e. can I simply assign each token to a variable and use it or does it need strncpy() to be copied to an allocated space?
For example, would this be valid?
char str[80] = "This is - www.tutorialspoint.com - website";
const char s[2] = "-";
char *token;
char *test[4];
int test_count = 0;
memset(test, 0x00, 4);
/* get the first token */
token = strtok(str, s);
/* walk through other tokens */
while( token != NULL ) {
test[test_count] = token;
test_count++;
token = strtok(NULL, s);
}
strtok() works on your original input string, by replacing the first occurence of a character in the list of delimeters with a '\0'. So yes, this is the intended usage as you describe it.
Side notes:
don't write things like
const char s[2] = "-";
just using
const char s[] = "-";
lets the compiler determine the correct size automatically
in this special case, just passing "-" to strtok() (or a #define to "-") would do fine, a decent compiler recognizes identical string literals and creates only one instance of them.
just in case it's helpful to see some code, here's a simple strtok implementation I did myself a while back.
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.
I have a character like ';' or ',' used as a delimeter in a raw string. I need to split the string and iterate over each string.
Ex: If,
char* str = "apples, mangoes , orang; ,ad";
And the delimiter is ',' then I need something like
while(substr!='\0') {
func(substr);
//some operation maybe like substr=strstr(substr)+1;
}
The function should be called 4 times with strings: "apples"," mangoes "," orang; ","ad".
In your case str is string literal and you cannot use strtok on that since its in R_ONLY section.
and strtok does modify your str which is literal and would yield runtime error (Segmentation Fault).
If you want to split it then you must have an user input. Probably fgets (most preferred one).
fgets (str, SIZE, stdin); // user input for str
Make use of strtok function only if you have user input for a string.
char * strtok (char *string, const char *delimiter);
This is how you can use it.
char *buff = strtok (str, ",;");
while (buff != NULL)
{
printf (buff);
buff = strtok (NULL, ",;");
}
For more information on the string functions
man string
strtok is handy tool for tokenize the string in C. Also note strtok modify the original string.
In your case
char* str = "apples, mangoes , orang; ,ad";
This is string literal which is read only and it's Undefined Behaviour if you use strtok on this. So batter to use predefined length or array or copy this string to some temp buffer then apply strtok on temp buffer.
For example
int main()
{
char str[] = "apples, mangoes , orang; ,ad";
char *token = strtok (str, ",;");
while (token != NULL)
{
printf ("%s ",token);
token = strtok (NULL, ",;");
}
return 0;
}
So I got my go to split the environment path and I got them all
char *token;
char *path;
char copy[200];
char *search = ":";
char echo[] = "echo";
int main(){
path= getenv("PATH");
strncpy(copy,path,sizeof(copy)-1);
token = strtok (copy,":");
printf("%s\n",path);
while(token != NULL)
{
printf("%s\n",token);
token= strtok (NULL,":");
}
}
I get want I need
/usr/lib64/qt-3.3/bin
/usr/NX/bin
/usr/local/bin
/usr/bin
/usr/divms/bin
/usr/local/sbin
/usr/sbin
/space/befox/bin
/space/befox/bin
now I just need to concat a "/" to the end of all of those, and i got it to work BUT it only prints the 1st one.
so here is my code:
char *token;
char *path;
char copy[200];
char *search = ":";
char echo[] = "echo";
char *result;
int main(){
path= getenv("PATH");
strncpy(copy,path,sizeof(copy)-1);
token = strtok (copy,":");
printf("%s\n",path);
while(token != NULL)
{
result = strncat (token,"/",sizeof(token+1));
printf("%s\n",token);
token= strtok (NULL,":");
}
}
and now I just get:
/usr/lib64/qt-3.3/bin/
What do I need to fix so I get all of the lines with a "/" at the end of them?
You can't modify the values that strtok returns. You're lengthening them by 1 char, which means you're writing past the end of a string, which is undefined behavior. In all likelihood, strtok replaces the : with a \0 and saves a pointer to just past the \0, which should be the beginning of your second token. However, you replace that \0 with a / and put a \0 just past that point, and now when strtok goes to look for your next token, all it finds is that \0 and it assumes your string is done.
Don't modify the return value from strtok without copying it first.
I you just want to print, you might want to add the / in the format line:
printf("%s/\n",token);
You are getting only one line because you are modifying the buffer you are reading with the following line:
strncat(token, "/", sizeof(token+1));
As per documentation:
Appends the first num characters of source to destination, plus a terminating null-character.
You should copy the token and then add the trailing /.
You shouldn't attempt to modify the string you're passing to strtok(), you'll get highly unexpected behavior that way. You should set up a new string and copy the string pointed to by token to it, and do the concatenation there. sizeof(token+1) is also incorrect, both because you're just adding 1 to the pointer and not affecting the result of sizeof at all, and because you're just getting the size of the pointer this way. strlen() is what you're looking for.
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.