How to save the remaining string from strtok_r()? - c

I'm trying to figure out how to pull the remaning string that needs to be parsed (the third parameter of strtok_r()), but am lost as to how to do so.
The initial input comes from a char pointer defined by malloc().
The code below is what I am trying to achieve.
num = strtok_r(raw_in, delim, &rest_of_nums);
while(rest_of_nums != NULL){
while(num != NULL){
//Compare num with fist digit of rest_of_nums
num = strtok_r(NULL, delim, &rest_of_nums);
}
//Iterate to compare num to second digit of rest_of_nums
}

I think you are trying to mix up strtok() and strtok_r(). The syntax of strtok() is as follows:
char * strtok ( char * str, const char * delimiters );
and the syntax of strtok_r() is as follows:
char * strtok_r ( char * str, const char * delimiters, char **saveptr );
When we call strtok() for the first time, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of the last token as the new starting location for scanning. The point where the last token was found is kept internally by the function to be used on the next call.
However, in strtok_r(), the third argument saveptr is a pointer to a char * variable that is used internally by strtok_r() in order to maintain context between successive calls that parse the same string.
A sample example for strtok_r() is as follows:
char str[] = "sample strtok_r example gcc stack overflow";
char * token;
char * raw_in = str;
char * saveptr;
//delimiter is blank space in this example
token = strtok_r(raw_in, " ", &saveptr);
while (token != NULL) {
printf("%s\n", token);
printf("%s\n", saveptr);
token = strtok_r(NULL, " ", &saveptr);
}
The output should be as follows:
sample
strtok_r example gcc stack overflow
strtok_r
example gcc stack overflow
example
gcc stack overflow
gcc
stack overflow
stack
overflow
overflow
Source:
http://www.cplusplus.com/reference/cstring/strtok/
https://www.geeksforgeeks.org/strtok-strtok_r-functions-c-examples/
Questions are welcome.

Related

issue with strtok to compare two words from a nested results of strtok function

I'm looking for comparing words from an array with words in the dictionary from another array to look for the maximum number of words found
I used strtok since the words in both are delimited with spaces, but it's not working. I need your help please
void chercherScoreMotDansDico(char msgBootforce [], int*
maxCorrepondance, char* mot, char* dicoActuel, char*
bonResultatBootforce) {
int i = 0;
char* motdico = NULL;
char tmpMsgBootForce [3000] = {0};
strcpy(tmpMsgBootForce, msgBootforce);
mot = strtok (tmpMsgBootForce, " ");
while (mot != NULL) {
motdico = strtok (dicoActuel, " ");
while (motdico != NULL) {
if (strcmp(mot,motdico) == 0) ++i;
motdico = strtok (NULL, " ");
}
mot = strtok (NULL," ");
}
if (i > *(maxCorrepondance)) {
*(maxCorrepondance) = i;
strcat(bonResultatBootforce, msgBootforce);
}
}
You can't have two uses of strtok() on two different strings being done at the same time.; strtok() has an internal pointer where it stores the address of the current string being processed. If you call strtok() with a string and then call strtok() with a different string then when you do strtok(NULL, delim) it will continue with the last string that was specified.
See https://en.cppreference.com/w/c/string/byte/strtok
This function is destructive: it writes the '\0' characters in the
elements of the string str. In particular, a string literal cannot be
used as the first argument of strtok. Each call to strtok modifies a
static variable: is not thread safe. Unlike most other tokenizers,
the delimiters in strtok can be different for each subsequent token,
and can even depend on the contents of the previous tokens. The
strtok_s function differs from the POSIX strtok_r function by guarding
against storing outside of the string being tokenized, and by checking
runtime constraints.
There is a new version of the strtok() function strtok_s() which has an additional argument of an address for a pointer variable to use instead of the internal pointer variable that strtok() uses.
You can't use strtok with two different strings at the same time.
strtok(string, delim) stores its position in string internally for future calls to strtok (NULL, delim). It can only remember one at a time. strtok (tmpMsgBootForce, " ") says to look through tmpMsgBootForce and then motdico = strtok (dicoActuel, " ") overwrites that with dicoActuel.
What to use instead depends on your compiler. The C standard defines strtok_s, but that's from the 2011 standard and has proven to be controversial. POSIX defines strtok_r, most Unix compilers will understand that. Finally, Visual Studio has their own slightly different strtok_s.
They all work basically the same way. You manually store the position in each string you're iterating through.
Here it is using strtok_r. next_tmpMsgBootforce and next_dicoActuel hold the position for parsing tmpMsgBootForce and dicoActuel respectively.
char *next_tmpMsgBootforce;
char *next_dicoActuel;
strcpy(tmpMsgBootForce, msgBootforce);
mot = strtok_r(tmpMsgBootForce, " ", &next_tmpMsgBootforce);
while (mot != NULL) {
motdico = strtok_r(dicoActuel, " ", &next_dicoActuel);
while (motdico != NULL) {
if (strcmp(mot,motdico) == 0) ++i;
motdico = strtok_r(NULL, " ", &next_dicoActuel);
}
mot = strtok_r(NULL," ", &next_tmpMsgBootforce);
}
Because this is all such a mess, I recommend using a library such as GLib to smooth out these incompatibilities and unsafe functions.
As a side note, the strcpy and strcat are not safe. If their destination does not have enough space it will try to write outside its memory bounds. As with strtok the situation to do this safely is a mess. There's the non-standard but ubiquitous strlcpy and strlcat. There's the standard but not ubiquitous strcpy_s and strcat_s. Thankfully for once Visual Studio follows the standard.
On POSIX systems you can use strdup to duplicate a string. It will handle the memory allocation for you.
char *tmpMsgBootForce = strdup(msgBootForce);
The caveat is you have to free this memory at the end of the function.
Doing a strcat safely gets complicated. Let's simplify this by splitting it into two functions. One to do the searching.
int theSearching(
const char *msgBootforce,
const char *dicoActuel
) {
int i = 0;
char *next_tmpMsgBootforce;
char *next_dicoActuel;
char *tmpMsgBootForce = strdup(msgBootforce);
char *tmpDicoActuel = strdup(dicoActuel);
char *mot = strtok_r(tmpMsgBootForce, " ", &next_tmpMsgBootforce);
while (mot != NULL) {
char *motdico = strtok_r(tmpDicoActuel, " ", &next_dicoActuel);
while (motdico != NULL) {
if (strcmp(mot,motdico) == 0) {
++i;
}
motdico = strtok_r(NULL, " ", &next_dicoActuel);
}
mot = strtok_r(NULL," ", &next_tmpMsgBootforce);
}
return i;
}
And one to do the appending. This function ensures there's enough space for the concatenation.
char *tryAppend( char *dest, const char *src, int *maxCorrepondance, const int numFound ) {
char *new_dest = dest;
if (numFound > *maxCorrepondance) {
*(maxCorrepondance) = numFound;
// Allocate enough memory for the concatenation.
// Don't forget space for the null byte.
new_dest = realloc( dest, strlen(dest) + strlen(src) + 1 );
strcat( new_dest, src);
}
// Return a pointer to the reallocated memory,
// or just the old one if no reallocation was necessary.
return new_dest;
}
Then use them together.
int numFound = theSearching(msgBootforce, dicoActuel);
bonResultatBootforce = tryAppend(bonResultatBootforce, msgBootforce, &maxCorrepondance, numFound);

Weird output from strtok

I was having some issues dealing with char*'s from an array of char*'s and used this for reference: Splitting C char array into words
So what I'm trying to do is read in char arrays and split them with a space delimiter so I can do stuff with it. For example if the first token in my char* is "Dog" I would send it to a different function that dealt with dogs. My problem is that I'm getting a strange output.
For example:
INPUT: *cmd = "Dog needs a vet appointment."
OUTPUT: (from print statements) "Doneeds a vet appntment."
I've checked for memory leaks using valgrind and I have none of them or other errors.
void parseCmd(char* cmd){ //passing in an individual char* from a char**
char** p_args = calloc(100, sizeof(char*));
int i = 0;
char* token;
token = strtok(cmd, " ");
while (token != NULL){
p_args[i++] = token;
printf("%s",token); //trying to debug
token = strtok(NULL, cmd);
}
free(p_args);
}
Any advice? I am new to C so please bear with me if I did something stupid. Thank you.
In your case,
token = strtok(NULL, cmd);
is not what you should be doing. You instead need:
token = strtok(NULL, " ");
As per the ISO standard:
char *strtok(char * restrict s1, const char * restrict s2);
A sequence of calls to the strtok function breaks the string pointed to by s1 into a sequence of tokens, each of which is delimited by a character from the string pointed to by s2.
The only difference between the first and subsequent calls (assuming, as per this case, you want the same delimiters) should be using NULL as the input string rather than the actual string. By using the input string as the delimiter list in subsequent calls, you change the behaviour.
You can see exactly what's happening if you try the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void parseCmd(char* cmd) {
char* token = strtok(cmd, " ");
while (token != NULL) {
printf("[%s] [%s]\n", cmd, token);
token = strtok(NULL, cmd);
}
}
int main(void) {
char x[] = "Dog needs a vet appointment.";
parseCmd(x);
return 0;
}
which outputs (first column will be search string to use next iteration, second is result of this iteration):
[Dog] [Dog]
[Dog] [needs a vet app]
[Dog] [intment.]
The first step worked fine since you were using space as the delimiter and it modified the string by placing a \0 at the end of Dog.
That means the next attempt (with the wrong spearator) would use one of the letters from {D,o,g} to split. The first matching letter for that set is the o in appointment which is why you see needs a vet app. The third attempt finds none of the candidate letters so you just get back the rest of the string, intment..
token = strtok(NULL, cmd); should be token = strtok(NULL, " ");.
The second argument is for delimiter.
http://man7.org/linux/man-pages/man3/strtok.3.html

Split a string with delimiters with support for missing values C99 [duplicate]

I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.

C - Determining which delimiter used - strtok()

Let's say I'm using strtok() like this..
char *token = strtok(input, ";-/");
Is there a way to figure out which token actually gets used? For instance, if the inputs was something like:
Hello there; How are you? / I'm good - End
Can I figure out which delimiter was used for each token? I need to be able to output a specific message, depending on the delimiter that followed the token.
Important: strtok is not re-entrant, you should use strtok_r instead of it.
You can do it by saving a copy of the original string, and looking into offsets of the current token into that copy:
char str[] = "Hello there; How are you? / I'm good - End";
char *copy = strdup(str);
char *delim = ";-/";
char *res = strtok( str, delim );
while (res) {
printf("%c\n", copy[res-str+strlen(res)]);
res = strtok( NULL, delim );
}
free(copy);
This prints
;
/
-
Demo #1
EDIT: Handling multiple delimiters
If you need to handle multiple delimiters, determining the length of the current sequence of delimiters becomes slightly harder: now you need to find the next token before deciding how long is the sequence of delimiters. The math is not complicated, as long as you remember that NULL requires special treatment:
char str[] = "(20*(5+(7*2)))+((2+8)*(3+6*9))";
char *copy = strdup(str);
char *delim = "*+()";
char *res = strtok( str, delim );
while (res) {
int from = res-str+strlen(res);
res = strtok( NULL, delim );
int to = res != NULL ? res-str : strlen(copy);
printf("%.*s\n", to-from, copy+from);
}
free(copy);
Demo #2
You can't. strtok overwrites the next separator character with a nul character (in order to terminate the token that it's returning this time), and it doesn't store the previous value that it overwrites. The first time you call strtok on your example string, the ; is gone forever.
You could do something if you keep an unmodified copy of the string you're modifying with strtok - given the index of the nul terminator for your current token (relative to the start of the string), you can look at the same index in the copy and see what was there.
That might be worse than just writing your own code to separate the string, of course. You can use strpbrk or strcspn, if you can live with the resulting token not being nul-terminated for you.
man 3 strtok
The strtok() and strtok_r() functions return a pointer to the
beginning of each subsequent token in the string, after replacing the
token itself with a NUL character. When no
more tokens remain, a null pointer is returned.
But with a little pointer arithmetic you can do something like:
char* string = "Hello,World!";
char* dup = strdup(string);
char* world = strtok(string, ",");
char delim_used = dup[world - string];
free(dup);

Need to know when no data appears between two token separators using strtok()

I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.

Resources