Very basic strtok program misusing delimiters - C - c

Here is my program (written in C, compiled and run on Omega, if it makes any difference):
#include <stdio.h>
#include <string.h>
int main (void)
{
char string[] = " hello!how are you? I am fine.";
char *token = strtok(string,"!?.");
printf("Token points to '%c'.\n",*token);
return 0;
}
This is the output I'm expecting:
"Token points to '!'."
But the output I'm getting is:
"Token points to ' '."
From trial and error, I know this is referring to the first character in the string: the space before "hello!".
Why am I not getting the output I'm expecting, and how can I fix it? I do understand from what I've read on here already that strtok is better off buried in a ditch, but let's assume that (if it's possible) I have to use it here, and I have to make it work.

As per strtok man page description
The strtok() function parses a string into a sequence of tokens. On
the first call to strtok() the string to be parsed should be specified
in str. In each subsequent call that should parse the same string, str
should be NULL.
It parses the string based on delimiter and return you the string not the delimiter.
In your case delimiters are "!?."
char string[] = " hello!how are you? I am fine.";
First occurrence of the delimiter "!" match after the string " hello". So it will return " hello" as return of strtok. And your output is nothing but first character ' ' of the " hello" string.

Someone just posted an answer. It worked for me and now I can't find it. Reposting as best I remember in case someone else has the same question.
char *token = strtok(string,"!?.");
token = strtok(NULL, "!?."); //<--THIS
token points to the first letter after the first delimiter, which is at least something I can work with. Thank you stranger!

Related

Handling consecutive delimiters with strsep() in C

I am trying to read a string word by word in C using strsep() function, which can be also done using strtok(). When there are consecutive delimiters -in my case the empty space- the function does not ignore them. I am expected to use strsep() and couldn't figure out the solution. I'd appreciate it if one of you can help me.
#include <stdio.h>
#include <string.h>
int main(){
char newLine[256]= "scalar i";
char *q;
char *token;
q = strdup(newLine);
const char delim[] = " ";
token = strsep(&q, delim);
printf("The token is: \"%s\"\n", token);
token = strsep(&q, delim);
printf("The token is: \"%s\"\n", token);
return 0;
}
Actual output is:
The token is: "scalar"
The token is: ""
What I expected is:
The token is: "scalar"
The token is: "i"
To do that I also tried to write a while loop so that I could continue until the token is non-empty.
But I cannot equate tokens with "", " ", NULL or "\n". Somehow the token is not equal to any of these.
First note that strsep(), while convenient is not in the standard C library, and will only be available on Unix systems with BSD-4.4 C library support. That's most Unix'ish systems today, but still.
Anyway, strsep() supports empty fields. That means that if your string has consecutive delimiters, it will find empty, length-0, tokens between each of these delimiters. For example, the tokens for string "ab cd" will be:
"ab"
""
"cd"
2 delimiters -> 3 tokens.
Now, you also said:
I cannot equate tokens with "", " ", NULL or "\n". Somehow the token is not equal to any of these.
I am guessing what you were trying to perform is simply comparison, e.g. if (my_token == "") { ... }. That won't work, because that is a comparison of pointers, not of the strings' contents. Two strings may have identical characters at different places in memory, and that is particularly likely with the example I just gave, since my_token will be dynamic, and will not be pointing to the static-storage-duration string "" used in the comparison.
Instead, you will need to use strcmp(my_token,""), or better yet, just check manually for the first char being '\0'.

Tokenizing a string when encountered a newline - Not working newline is not getting recognized

I am trying to tokenize a string when encountered a newline.
rest = strdup(value);
while ((token = strtok_r(rest,"\n", &rest))) {
snprintf(new_value, MAX_BANNER_LEN + 1, "%s\n", token);
}
where 'value' is a string say, "This is an example\nHere is a newline"
But the above function is not tokenizing the 'value' and the 'new_value' variable comes as it is i.e. "This is an example\nHere is a newline".
Any suggestions to overcome this?
Thanks,
Poornima
Several things going on with your code:
strtok and strtok_r take the string to tokenize as first parameter. Subsequent tokenizations of the same string should pass NULL. (It is okay to tokenize the same string with different delimiters.)
The second parameter is a string of possible separators. In your case you should pass "\n". (strtok_r will treat stretches of the characters as single break. That means that tokenizing "a\n\n\nb" will produce two tokens.)
The third parameter to strtok_r is an internal parameter to the function. It will mark where the next tokenization should start, but you need not use it. Just define a char * and pass its address.
Especially, don't repurpose the source string variable as state. In your example, you will lose the handle to the strduped string, so that you cannot free it later, as you should.
It is not clear how you determine that your tokenization "doesn't work". You print the token to the same char buffer repeatedly. Do you want to keep only the part after the last newline? In that case, use strchrr(str, '\n'). If the result isn't NULL it is your "tail". If it is NULL the whole string is your tail.
Here's how tokenizing a string could work:
char *rest = strdup(str);
char *state;
char *token = strtok_r(rest, "\n", &state);
while (token) {
printf("'%s'\n", token);
token = strtok_r(NULL, "\n", &state);
}
free(rest);

How do I cut up string using a delimiter (space) just like substr in php?

This is my Mystr value:
others:0.01 penalty:0.02 pdi:0.03 pdp:0.04 interest:0.05
principal:0.06 cbu:0.07 savings:0.08 bankcharge:0.09 grt:0.10
My desired output:
others:0.01
penalty:0.02
pdi:0.03
pdp:0.04
interest:0.05
principal:0.06
cbu:0.07
savings:0.08
bankcharge:0.09
grt:0.10
I want this to be assigned to a different variable. How do I do this?
The tool in C for this is strtok(GNU Manual, SUS V2 Spec). You call strtok the first time with your string and delimiter set. Then, for subsequent portions, call strtok with NULL and the delimiter set and it will keep searching from where it left off.
#include <string.h>
#include <stdio.h>
int main(void) {
char x[] = "others:0.01 penalty:0.02 pdi:0.03 pdp:0.04 interest:0.05 principal:0.06 cbu:0.07 savings:0.08 bankcharge:0.09 grt:0.10";
char toPrint[sizeof(x) * 2];
char *a;
strcpy(toPrint,strtok(x," "));
strcat(toPrint,"\n");
while ((a=strtok(NULL," ")) != NULL) {
strcat(toPrint,a);
strcat(toPrint,"\n");
}
fputs(toPrint,stdout);
}
Prints
others:0.01
penalty:0.02
pdi:0.03
pdp:0.04
interest:0.05
principal:0.06
cbu:0.07
savings:0.08
bankcharge:0.09
grt:0.10
Note that strtok modifies the original array. At the end of the program, the x array contains "1\02\03\04". All the delimiters have been overwritten by zeros. Also note that two consecutive delimiters in the string will cause strtok to yield an empty string "" for the (missing) value.
If you were writing code by Python, you would be lucky to use split().
In C, you can use strtok http://www.cplusplus.com/reference/cstring/strtok/

I misunderstand win32 (and maybe libc) strtok( )

In some CGI code, I need to encode rarely-occurring '&', '<', and '>' chars. In the encoding function, I want to get out right away if there are no such chars in the input string. So, at entry, I try to use strtok( ) to find that out:
char *
encode_amp_lt_gt ( char *in ) {
...
if ( NULL == strtok( in, "&<>" )) {
return in;
}
...
}
But, even in the absence of any of the delimiters, strtok( ) returns a pointer to the first character of in.
I expected it to return NULL if no delims in the string.
Is my code wrong, or is my expectation wrong? I don't want to call strchr( ) three times just to eliminate the usual case.
Thanks!
You probably don't want strtok to begin with, as it leaves you no way of figuring what character was eliminated (except if you have a spare copy of the string).
strtok is not a straightforward API and is easy to misunderstand.
Quoting the manpage:
The strtok() and strtok_r() functions return a pointer to the beginning of
each subsequent token in the string, after replacing the token itself with
a NUL character. When no more tokens remain, a null pointer is returned.
Your problem probably means you've fallen to the obscurity of the algorithm. Suppose this string:
char* value = "foo < bar & baz > frob";
The first time you call strtok:
char* ptr = strtok(value, "<>&");
strtok will return you the value pointer, except that it will have modified the string to this:
"foo \0 bar & baz > frob"
As you may notice, it changed the < to a NUL. Now, however, if you use value, you'll get "foo " since there's a NUL in the middle of the way.
Subsequent calls to strtok with NULL will proceed through the string, until you've reached the end of the string, at which point you'll get NULL.
char* str = "foo < bar & frob > nicate";
printf("%s\n", strtok(str, "<>&")); // prints "foo "
printf("%s\n", strtok(NULL, "<>&")); // prints " bar "
printf("%s\n", strtok(NULL, "<>&")); // prints " frob "
printf("%s\n", strtok(NULL, "<>&")); // prints " nicate"
assert(strtok(NULL, "<>&") == NULL); // should be true
It would be fairly straightforward to write a function that replaces the contents without strtok, either dealing with the hard work yourself, or getting help from strpbrk and strcat.
The function you want is strpbrk, not strtok. The bigger question is - how is the string that is being returned being allocated when you're replacing things, and how does the calling function know if it should free it or not?

strtok is using wrong delimiter

Why is my strtok breaking up my strings after space when I specified my delimiter as ","?
I can only suggest that you're doing something wrong though it's a little hard to tell exactly what (you should generally post your code when asking about specifics). Sample programs, like the following, seem to work fine:
#include <stdio.h>
#include <string.h>
int main (void) {
char *s;
char str[] =
"This is a string,"
" with both spaces and commas,"
" for testing.";
printf ("[%s]\n", str);
s = strtok (str, ",");
while (s != NULL) {
printf (" [%s]\n", s);
s = strtok (NULL, ",");
}
return 0;
}
It outputs:
[This is a string, with both spaces and commas, for testing.]
[This is a string]
[ with both spaces and commas]
[ for testing.]
The only possibility that springs to mind immediately is if you're using " ," instead of ",". In that case, you would get:
[This is a string, with both spaces and commas, for testing.]
[This]
[is]
[a]
[string]
[with]
[both]
[spaces]
[and]
[commas]
[for]
[testing.]
Thanks! I looked around and figured out that the problem was with my scanf which doesn't read the whole line the user inputs. It seems that my strtok was working fine but the value i am using to match the return value of strtok is wrong.
For example, my strtok function takes "Jeremy whitfield,Ronny Whifield" and gives me "Jeremy Whitfield" and "Ronny Whitfield". In my program, i am using scanf to take in user input > "Ronny Whitfield" which is actually only reading "Ronny". So its a problem with my scanf not strtok.
My virtual machine is getting stuck everytime i open it so i am unable to access my code for now.

Resources