I misunderstand win32 (and maybe libc) strtok( ) - c

In some CGI code, I need to encode rarely-occurring '&', '<', and '>' chars. In the encoding function, I want to get out right away if there are no such chars in the input string. So, at entry, I try to use strtok( ) to find that out:
char *
encode_amp_lt_gt ( char *in ) {
...
if ( NULL == strtok( in, "&<>" )) {
return in;
}
...
}
But, even in the absence of any of the delimiters, strtok( ) returns a pointer to the first character of in.
I expected it to return NULL if no delims in the string.
Is my code wrong, or is my expectation wrong? I don't want to call strchr( ) three times just to eliminate the usual case.
Thanks!

You probably don't want strtok to begin with, as it leaves you no way of figuring what character was eliminated (except if you have a spare copy of the string).
strtok is not a straightforward API and is easy to misunderstand.
Quoting the manpage:
The strtok() and strtok_r() functions return a pointer to the beginning of
each subsequent token in the string, after replacing the token itself with
a NUL character. When no more tokens remain, a null pointer is returned.
Your problem probably means you've fallen to the obscurity of the algorithm. Suppose this string:
char* value = "foo < bar & baz > frob";
The first time you call strtok:
char* ptr = strtok(value, "<>&");
strtok will return you the value pointer, except that it will have modified the string to this:
"foo \0 bar & baz > frob"
As you may notice, it changed the < to a NUL. Now, however, if you use value, you'll get "foo " since there's a NUL in the middle of the way.
Subsequent calls to strtok with NULL will proceed through the string, until you've reached the end of the string, at which point you'll get NULL.
char* str = "foo < bar & frob > nicate";
printf("%s\n", strtok(str, "<>&")); // prints "foo "
printf("%s\n", strtok(NULL, "<>&")); // prints " bar "
printf("%s\n", strtok(NULL, "<>&")); // prints " frob "
printf("%s\n", strtok(NULL, "<>&")); // prints " nicate"
assert(strtok(NULL, "<>&") == NULL); // should be true
It would be fairly straightforward to write a function that replaces the contents without strtok, either dealing with the hard work yourself, or getting help from strpbrk and strcat.

The function you want is strpbrk, not strtok. The bigger question is - how is the string that is being returned being allocated when you're replacing things, and how does the calling function know if it should free it or not?

Related

skip strtok's null terminators safely

I want to use strtok and then return the string after the null terminator that strtok has placed.
char *foo(char *bar)
{
strtok(bar, " ");
return after_strtok_null(bar);
}
/*
examples:
foo("hello world") = "world"
foo("remove only the first") = "only the first"
*/
my code is not for skipping the first word (as I know a simple while loop will do) but I do want to use strtok once and then return the part that was not tokenized.
I will provide details of what I am trying to do at the end of the question, although I don't think it's really necessary
one solution that came into my mind was to simply skip all the null terminators until I reach a non - null:
char *foo(char *bar)
{
bar = strtok(bar, " ");
while(!(*(bar++)));
return bar;
}
This works fine for the examples shown above, but when it comes to using it on single words - I may misidentify the string's null terminator to be strtok's null terminator, and then I may access non - allocated memory.
For example, if I will try foo("demo"\* '\0' *\) the of strtok will be "demo"\* '\0' *\
and then, if I would run the while loop I will accuse the part after the string demo. another solution I have tried is to use strlen, but this one have the exact same problem.
I am trying to create a function that gets a sentence. some of the sentences have have their first word terminated with colons, although not necessarily. The function need to take the first word if it is terminated with colons and insert it (without the colons) into some global table. Then return the sentence without the first colons - terminated word and without the spaces that follow the word if the word has colons - terminated word at the start and otherwise, just return the sentence without the spaces in the start of the sentence.
You could use str[c]spn instead:
char *foo(char *bar) {
size_t pos = strcspn(bar, " ");
pos = strspn((bar += pos), "");
// *bar = '\0'; // uncomment to mimic strtok
return bar + pos;
}
You will get the expected substring of an empty string.
A good point is that you can avoid changing the original string - even if mimicing strtok is trivial...

Tokenizing a string when encountered a newline - Not working newline is not getting recognized

I am trying to tokenize a string when encountered a newline.
rest = strdup(value);
while ((token = strtok_r(rest,"\n", &rest))) {
snprintf(new_value, MAX_BANNER_LEN + 1, "%s\n", token);
}
where 'value' is a string say, "This is an example\nHere is a newline"
But the above function is not tokenizing the 'value' and the 'new_value' variable comes as it is i.e. "This is an example\nHere is a newline".
Any suggestions to overcome this?
Thanks,
Poornima
Several things going on with your code:
strtok and strtok_r take the string to tokenize as first parameter. Subsequent tokenizations of the same string should pass NULL. (It is okay to tokenize the same string with different delimiters.)
The second parameter is a string of possible separators. In your case you should pass "\n". (strtok_r will treat stretches of the characters as single break. That means that tokenizing "a\n\n\nb" will produce two tokens.)
The third parameter to strtok_r is an internal parameter to the function. It will mark where the next tokenization should start, but you need not use it. Just define a char * and pass its address.
Especially, don't repurpose the source string variable as state. In your example, you will lose the handle to the strduped string, so that you cannot free it later, as you should.
It is not clear how you determine that your tokenization "doesn't work". You print the token to the same char buffer repeatedly. Do you want to keep only the part after the last newline? In that case, use strchrr(str, '\n'). If the result isn't NULL it is your "tail". If it is NULL the whole string is your tail.
Here's how tokenizing a string could work:
char *rest = strdup(str);
char *state;
char *token = strtok_r(rest, "\n", &state);
while (token) {
printf("'%s'\n", token);
token = strtok_r(NULL, "\n", &state);
}
free(rest);

How do I use strtok to cut a string that varies in size?

So, i'm fairly new to C, and i get as input a string like this a:name:number:number:number:number:name:name and i'm using strtok to get all the names and numbers without the ":". The thing is, the size the string can vary and it can have up to more 2 names(always at the end) like: a:name:number:number:number:number:name:name:name or a:name:number:number:number:number:name:name:name:name.
Now, i'm using a struct and strcpy to get each name in the struct but because of the quantity of names changes in the end, i get Segmentation Faults when there are less than 4 names at the end, which is the maximum number of names(minimum is 1) and I think it's because i continue to do strtok after the end of the string.
Here's my code:
char *token;
structname a;
token = strtok(c,":"); //c is the input string
strcpy(a.name1,strtok(NULL,":"));
a.number1 = atoi(strtok(NULL,":"));
a.number2 = atoi(strtok(NULL,":"));
a.number3 = atoi(strtok(NULL,":"));
a.number4 = atoi(strtok(NULL,":"));
strcpy(a.name2,strtok(NULL,":"));
strcpy(a.name3,strtok(NULL,":"));
strcpy(a.name4,strtok(NULL,":"));
strcpy(a.name5,strtok(NULL,":"));
So, i'm guessing that the error occurs because it continues to do strtok even after the string is over in the cases where there are no 4 middle names?
I want to know how i can, for example, in a case where there are 2 names at the end, just get a.name4 and a.name5 to "\0" or just dont get anything in those strings.
Thanks for time and help!
Typically, strtok is used in a loop. For example:
char *token = strtok(input_string, ".");
while(token != NULL) {
//do code
token = strtok(NULL, ".");
}
This way, the loop ends when the first failed parse happens. strtok should continue to return NULL after the end of the input string anytime you use it, so that shouldn't be causing the issue, but it might be.
The most likely issue in my opinion is with the way you use strtok in strcpy. The strcpy is probably failing because the source string is NULL. What I would do if you don't want to use the loop is
token = strtok(NULL, ":");
strcpy(a.name2, token);
This way, you ensure that it isn't a NULL pointer before trying to copy. strcpy works on an empty string (""), but it doesn't work on a NULL pointer.
The other thing you may want to check is that the destination strings (a.name2, etc) are allocated enough space. If not, this would also cause a seg fault.
strotok() returns NULL if there is no more tokens. So you should check return value, for example:
if( (token = strtok(c,":")) != NULL )
strcpy(a.name1,token);
else
// token is NULL, there is no more tokens, end of string reached
// you can return or do something else

Very basic strtok program misusing delimiters - C

Here is my program (written in C, compiled and run on Omega, if it makes any difference):
#include <stdio.h>
#include <string.h>
int main (void)
{
char string[] = " hello!how are you? I am fine.";
char *token = strtok(string,"!?.");
printf("Token points to '%c'.\n",*token);
return 0;
}
This is the output I'm expecting:
"Token points to '!'."
But the output I'm getting is:
"Token points to ' '."
From trial and error, I know this is referring to the first character in the string: the space before "hello!".
Why am I not getting the output I'm expecting, and how can I fix it? I do understand from what I've read on here already that strtok is better off buried in a ditch, but let's assume that (if it's possible) I have to use it here, and I have to make it work.
As per strtok man page description
The strtok() function parses a string into a sequence of tokens. On
the first call to strtok() the string to be parsed should be specified
in str. In each subsequent call that should parse the same string, str
should be NULL.
It parses the string based on delimiter and return you the string not the delimiter.
In your case delimiters are "!?."
char string[] = " hello!how are you? I am fine.";
First occurrence of the delimiter "!" match after the string " hello". So it will return " hello" as return of strtok. And your output is nothing but first character ' ' of the " hello" string.
Someone just posted an answer. It worked for me and now I can't find it. Reposting as best I remember in case someone else has the same question.
char *token = strtok(string,"!?.");
token = strtok(NULL, "!?."); //<--THIS
token points to the first letter after the first delimiter, which is at least something I can work with. Thank you stranger!

Parsing a string

i have a string of the format "ABCDEFG,12:34:56:78:90:11". i want to separate these two values that are separated by commas into two different strings. how do i do that in gcc using c language.
One possibility is something like this:
char first[20], second[20];
scanf("%19[^,], %19[^\n]", first, second);
So many people are suggesting strtok... Why? strtok is a left-over of stone age of programming and is good only for 20-line utilities!
Each call to strtok modifies strToken by inserting a null character after the token returned by that call. [...]
[F]unction uses a static variable for parsing the string into tokens. [...] Interleaving calls to this function is highly likely to produce data corruption and inaccurate results.
scanf, as in Jerry Coffin's answer, is a much better alternative. Or, you can do it manually: find the separator with strchr, then copy parts to separate buffers.
char str[] = "ABCDEFG,12:34:56:78:90:11"; //[1]
char *first = strtok(str, ","); //[2]
char *second = strtok(NULL, ""); //[3]
[1] ABCDEFG,12:34:56:78:90:11
[2] ABCDEFG\012:34:56:78:90:11
Comma replaced with null character with first pointing to 'A'
[3] Subsequent calls to `strtok` have NULL` as first argument.
You can change the delimiter though.
Note: you cannot use "string literals", because `strtok` modifies the string.
You can use strtok which will allow you to specify the separator and generate the tokens for you.
You could use strtok:
Example from cppreference.com:
char str[] = "now # is the time for all # good men to come to the # aid of their country";
char delims[] = "#";
char *result = NULL;
result = strtok( str, delims );
while( result != NULL ) {
printf( "result is \"%s\"\n", result );
result = strtok( NULL, delims );
}
Try using the following regex it will find anything with chars a-z A-Z followed by a ","
"[A-Z]," if you need lower case letter too try "[a-zA-Z],"
If you need it to search for the second part first you could try the following
",[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}"
There is an example on how to use REGEX's at
http://ddj.com/184404797
Thanks,
V$h3r

Resources