I am using strtok to tokenise the string, Is strtok affects the original buffer? For e.g:
*char buf[] = "This Is Start Of life";
char *pch = strtok(buf," ");
while(pch)
{
printf("%s \n", pch);
pch = strtok(NULL," ");
}*
printf("Orignal Buffer:: %s ",buf);
Output is::
This
Is
Start
Of
life
Original Buffer:: This
I read that strtok returns pointer to the next token, then how the buf is getting affected? Is there way to retain original buffer (without extra copy overhead)?
Follow-on Question:: from so far answers I guess there is no way to retain the buffer. So what if I use dynamic array to create original buffer and if strtok is going to affect it, then there will be memory leak while freeing the original buffer or is strtok takes care of freeing memory?
strtok() doesn't create a new string and return it; it returns a pointer to the token within the string you pass as argument to strtok(). Therefore the original string gets affected.
strtok() breaks the string means it replaces the delimiter character with NULL and returns a pointer to the beginning of that token. Therefore after you run strtok() the delim characters will be replaced by NULL characters. You can read link1 link2.
As you can see in output of example in link2, the output you are getting is as expected since the delim character is replaced by strtok.
When you do strtok(NULL, "|"), strtok finds a token and puts null on place (replace delimiter with '\0') and modifies the string. So you need to make the copy of the original string before tokenization.
Please try following:
void main(void)
{
char buf[] = "This Is Start Of life";
char *buf1;
/* calloc() function will allocate the memory & initialize its to the NULL*/
buf1 = calloc(strlen(buf)+1, sizeof(char));
strcpy(buf1, buf);
char *pch = strtok(buf," ");
while(pch)
{
printf("%s \n", pch);
pch = strtok(NULL," ");
}
printf("Original Buffer:: %s ",buf1);
}
Related
I have uni project, I need to check if the syntax is right. I get pointer to a string, and check if the first token acceptable. In case it's OK, i move forward. But in case it's not OK, i need to print what is wrong.
What i did is to create a buffer since i can't change the original string.
After that i use strtok to cut the buffer, and look if the token i got is acceptable.
char *str = "sz = 12345";
printf("The check of MACRO: %d\n", isMacro(str));
int isMacro(char *str)
{
char buf = NULL;
char *token;
strcpy(&buf,str);
token = strtok(&buf," ");
printf("You here, value token is %s\n",token);
}
I expected that printf would print the 'sz' but it prints:
You here, value str is sz<▒R
char buf = NULL;
This is a type error. buf is a single character, but NULL is a pointer value. You can't store a pointer in a char.
strcpy(&buf,str);
This code has undefined behavior (unless str happens to be an empty string). buf is not a buffer, it is a single char, so it does not have room to store a whole string.
If you want to make a copy of a string, you need to allocate enough memory for all of its characters:
You could use strdup (which is in POSIX, but not standard C):
char *buf = strdup(str);
if (!buf) {
... handle error ...
}
...
free(buf);
You could replicate strdup manually:
char *buf = malloc(strlen(str) + 1);
if (!buf) {
... handle error ...
}
strcpy(buf, str);
...
free(buf);
You could use a variable-length array (but you're limited by the size of your stack and you have no way to check for errors):
char buf[strlen(str) + 1];
strcpy(buf, str);
...
buf is a single char instead of a pointer to a char. In fact, if you're planning to do strcpy to copy a string to it, you need to allocate memory first using malloc. Instead I'd suggest you to use a function like strdup instead of strcpy to create a copy of the original string to modify it using strtok. Remember to free the strduped string later.
Something like this.
int isMacro(char *str)
{
char *buf = NULL;
char *token;
buf = strdup(str);
token = strtok(buf," ");
printf("You here, value of token is %s\n",token);
free(buf);
}
I typed up this block of code for an assignment:
char *tokens[10];
void parse(char* input);
void main(void)
{
char input[] = "Parse this please.";
parse(input);
for(int i = 2; i >= 0; i--) {
printf("%s ", tokens[i]);
}
}
void parse(char* input)
{
int i = 0;
tokens[i] = strtok(input, " ");
while(tokens[i] != NULL) {
i++;
tokens[i] = strtok(NULL, " ");
}
}
But, looking at it, I'm not sure how the memory allocation works. I didn't define the length of the individual strings as far as I know, just how many strings are in the string array tokens (10). Do I have this backwards? If not, then is the compiler allocating the length of each string dynamically? In need of some clarification.
strtok is a bad citizen.
For one thing, it retains state, as you've implicitly used when you call strtok(NULL,...) -- this state is stored in the private memory of the Standard C Library, which means only single threaded programs can use strtok. Note that there is a reentrant version called strtok_r in some libraries.
For another, and to answer your question, strtok modifies its input. It doesn't allocate space for the strings; it writes NUL characters in place of your delimiter in the input string, and returns a pointer into the input string.
You are correct that strtok can return more than 10 results. You should check for that in your code so you don't write beyond the end of tokens. A reliable program would either set an upper limit, like your 10, and check for it, reporting an error if it's exceeded, or dynamically allocate the tokens array with malloc, and realloc it if it gets too big. Then the error occurs when you fun out of memory.
Note that you can also work around the problem of strtok modifying your input string by strduping before passing it to strtok. Then you'll have to free the new string after both it and tokens, which points to it, are going out of scope.
tokens is an array of pointers.
The distinction between strings and pointers if often fuzzy. In some situations strings are better thought out as arrays, in other situations as pointers.
Anyway... in your example input is an array and tokens is an array of pointers to a place within input.
The data inside input is changed with each call to strtok()
So, step by step
// input[] = "foo bar baz";
tokens[0] = strtok(input, " ");
// input[] = "foo\0bar baz";
// ^-- tokens[0] points here
tokens[1] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[1] points here
tokens[2] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[2] points here
// next strtok returns NULL
I have a character like ';' or ',' used as a delimeter in a raw string. I need to split the string and iterate over each string.
Ex: If,
char* str = "apples, mangoes , orang; ,ad";
And the delimiter is ',' then I need something like
while(substr!='\0') {
func(substr);
//some operation maybe like substr=strstr(substr)+1;
}
The function should be called 4 times with strings: "apples"," mangoes "," orang; ","ad".
In your case str is string literal and you cannot use strtok on that since its in R_ONLY section.
and strtok does modify your str which is literal and would yield runtime error (Segmentation Fault).
If you want to split it then you must have an user input. Probably fgets (most preferred one).
fgets (str, SIZE, stdin); // user input for str
Make use of strtok function only if you have user input for a string.
char * strtok (char *string, const char *delimiter);
This is how you can use it.
char *buff = strtok (str, ",;");
while (buff != NULL)
{
printf (buff);
buff = strtok (NULL, ",;");
}
For more information on the string functions
man string
strtok is handy tool for tokenize the string in C. Also note strtok modify the original string.
In your case
char* str = "apples, mangoes , orang; ,ad";
This is string literal which is read only and it's Undefined Behaviour if you use strtok on this. So batter to use predefined length or array or copy this string to some temp buffer then apply strtok on temp buffer.
For example
int main()
{
char str[] = "apples, mangoes , orang; ,ad";
char *token = strtok (str, ",;");
while (token != NULL)
{
printf ("%s ",token);
token = strtok (NULL, ",;");
}
return 0;
}
char *extractSubstring(char *str)
{
char temp[256];
char *subString; // the "result"
printf("%s\n", str); //prints #include "hello.txt"
strcpy(temp, str); //copies string before tokenizing
subString = strtok(str,"\""); // find the first double quote
subString = strtok(NULL,"\""); // find the second double quote
printf("%s\n", subString); //prints hello.txt
strcpy(str, temp); //<---- the problem
printf("%s", subString); //prints hello.txt"
return subString;
}
After I strcpy, why does it add a quotation? When I comment out the 2nd strcpy line, the program works. The printfs will be deleted out of my program. I was just using it to show what was happening with my program.
Can someone please explain to me what is going on? Thank you.
It is important to realize that strtok() modifies the source string in-place, and returns pointers into it.
Thus, the two calls to strtok() turn str into
#include \0hello.txt\0
^ subString points here
(For simplicity, I don't show the final terminating \0).
Now, the second ("problematic") strcpy() changes str back to:
#include "hello.txt"
^ subString still points here
This is what makes the " reappear in subString.
One way to fix it is by tokenizing a copy and keeping the original intact. Just make sure that your function doesn't return a pointer to an automatic variable (that would go out of scope the moment the function returns).
The first thing to know is that strtok modifies the first argument (str), if this is a constant (such as when calling extractSubstring like so: extractSubstring("#include \"hello.txt\"");) then this leads to undefined behaviour.
You already copy str into temp so you should use temp in your calls to strtok. When the tokenizing is done you should copy subString into a variable that you either allocate on the heap (malloc) or that you pass to extractSubstring as an extra parameter. You can't return a pointer to a local array because the array runs out of scope the the function ends.
So in summary:
subString = strtok(temp, "\"");
subString = strtok(NULL, "\"");
char * ret = malloc(strlen(subString));
strcpy(ret, subString);
ret[strlen(ret)] = '\0';
return ret;
So I got my go to split the environment path and I got them all
char *token;
char *path;
char copy[200];
char *search = ":";
char echo[] = "echo";
int main(){
path= getenv("PATH");
strncpy(copy,path,sizeof(copy)-1);
token = strtok (copy,":");
printf("%s\n",path);
while(token != NULL)
{
printf("%s\n",token);
token= strtok (NULL,":");
}
}
I get want I need
/usr/lib64/qt-3.3/bin
/usr/NX/bin
/usr/local/bin
/usr/bin
/usr/divms/bin
/usr/local/sbin
/usr/sbin
/space/befox/bin
/space/befox/bin
now I just need to concat a "/" to the end of all of those, and i got it to work BUT it only prints the 1st one.
so here is my code:
char *token;
char *path;
char copy[200];
char *search = ":";
char echo[] = "echo";
char *result;
int main(){
path= getenv("PATH");
strncpy(copy,path,sizeof(copy)-1);
token = strtok (copy,":");
printf("%s\n",path);
while(token != NULL)
{
result = strncat (token,"/",sizeof(token+1));
printf("%s\n",token);
token= strtok (NULL,":");
}
}
and now I just get:
/usr/lib64/qt-3.3/bin/
What do I need to fix so I get all of the lines with a "/" at the end of them?
You can't modify the values that strtok returns. You're lengthening them by 1 char, which means you're writing past the end of a string, which is undefined behavior. In all likelihood, strtok replaces the : with a \0 and saves a pointer to just past the \0, which should be the beginning of your second token. However, you replace that \0 with a / and put a \0 just past that point, and now when strtok goes to look for your next token, all it finds is that \0 and it assumes your string is done.
Don't modify the return value from strtok without copying it first.
I you just want to print, you might want to add the / in the format line:
printf("%s/\n",token);
You are getting only one line because you are modifying the buffer you are reading with the following line:
strncat(token, "/", sizeof(token+1));
As per documentation:
Appends the first num characters of source to destination, plus a terminating null-character.
You should copy the token and then add the trailing /.
You shouldn't attempt to modify the string you're passing to strtok(), you'll get highly unexpected behavior that way. You should set up a new string and copy the string pointed to by token to it, and do the concatenation there. sizeof(token+1) is also incorrect, both because you're just adding 1 to the pointer and not affecting the result of sizeof at all, and because you're just getting the size of the pointer this way. strlen() is what you're looking for.