I am trying to access the data of*tkn within a different function in my program for example: putchar(*tkn); It is a global variable but its not working correctly. Any ideas?
#define MAX 20
// globals
char *tkn;
char array[MAX];
...
void tokenize()
{
int i = 0, j = 0;
char *delim = " ";
tkn = strtok (str," "); // get token 1
if (tkn != NULL) {
printf("token1: ");
while ((*tkn != 0) && (tkn != NULL))
{
putchar(*tkn);
array[i] = *tkn;
*tkn++;
i++;
}
}
}
In this line:
while ((*tkn != 0) && (tkn != NULL))
you need to reverse the conditions. If tkn is a null pointer, you will crash when the first term is evaluated. If you need to check a pointer for validity, do so before dereferencing it.
while (tkn != NULL && *tkn != '\0')
The extra parentheses you added do no harm but are not necessary. And although 0 is a perfectly good zero, the '\0' emphasizes that *tkn is a character. Of course, given the prior tkn != NULL condition in the if statement, there is no real need to repeat the check in the while loop.
Working code based on yours - some work left to do for subsequent tokens in the string, for example...
#include <stdlib.h>
#include <string.h>
enum { MAX = 20 };
char *tkn;
char array[MAX];
char str[2*MAX];
void tokenize(void)
{
int i = 0;
array[0] = '\0';
tkn = strtok(str, " "); // get token 1
if (tkn != NULL)
{
printf("token1: ");
while (tkn != NULL && *tkn != '\0' && i < MAX - 1)
{
putchar(*tkn);
array[i++] = *tkn++;
}
*tkn = '\0';
putchar('\n');
}
}
int main(void)
{
strcpy(str, "abc def");
tokenize();
printf("token = <<%s>>\n", array);
strcpy(str, "abcdefghijklmnopqrstuvwxyz");
tokenize();
printf("token = <<%s>>\n", array);
return(0);
}
Sample output:
token1: abc
token = <<abc>>
token1: abcdefghijklmnopqrs
token = <<abcdefghijklmnopqrs>>
Asked:
But what if I am taking in a string 'abc 3fc ghi' and I want to use just '3fc' in another function that say converts it from ascii to hex? How do I just use say tkn2 for 3fc and get that only using a pointer? – patrick 9 mins ago
That's where it gets trickier because strtok() has a moderately fiendish interface.
Leaving tokenize() unchanged, let's redefine str:
char *str;
Then, we can use (untested):
int main(void)
{
char buffer[2*MAX];
strcpy(buffer, "abc 3fc ghi");
str = buffer;
tokenize();
printf("token = <<%s>>\n", array); // "abc"
str = NULL;
tokenize();
printf("token = <<%s>>\n", array); // "3fc"
str = NULL;
tokenize();
printf("token = <<%s>>\n", array); // "ghi"
return(0);
}
Clearly, this relies on knowing that there are three tokens. To generalize, you'd need the tokenizer to tell you when there's nothing left to tokenize.
Note that having tkn as a global is really unnecessary - indeed, you should aim to avoid globals as much as possible.
You should use
tkn++
rather than
*tkn++
Just use strlcpy(3) instead of hand-coding the copy (hint - you are forgetting string zero terminator):
strlcpy( array, tkn, MAX );
Although tkn itself is a global variable, you also have to make sure that what it points to (ie. *tkn) is still around when you try to use it.
When you set tkn with a line like:
tkn = strtok (str," ");
Then tkn is pointing to part of the string that str points to. So if str was pointing to a non-static array declared in a function, for example, and that function has exited - then *tkn isn't allowed any more. If str was pointing to a block of memory allocated by malloc(), and you've called free() on that memory - then accessing *tkn isn't allowed after that point.
Related
Description of what my function attempts to do
My function gets a string for example "Ab + abc EF++aG hi jkL" and turns it into ["abc", "hi"]
In addition, the function only takes into account letters and the letters all have to be lowercase.
The problem is that
char* str1 = "Ab + abc EF++aG hi jkL";
char* str2 = "This is a very famous quote";
char** tokens1 = get_tokens(str1);
printf("%s", tokens1[0]); <----- prints out "abc" correct output
char** tokens2 = get_tokens(str2);
printf("%s", tokens1[0]); <----- prints out "s" incorrect output
get_tokens function (Returns the 2d array)
char** get_tokens(const char* str) {
// implement me
int num_tokens = count_tokens(str);
char delim[] = " ";
int str_length = strlen(str);
char* new_str = malloc(str_length);
strcpy(new_str, str);
char* ptr = strtok(new_str, delim);
int index = 0;
char** array_2d = malloc(sizeof(char*) *num_tokens);
while (ptr != NULL){
if (check_string(ptr) == 0){
array_2d[index] = ptr;
index++;
}
ptr = strtok(NULL, delim);
}
free(new_str);
new_str = NULL;
free(ptr);
ptr = NULL;
return array_2d;
}
count_tokens function (returns the number of valid strings)
for example count_tokens("AB + abc EF++aG hi jkL") returns 2 because only "abc" and "hi" are valid
int count_tokens(const char* str) {
// implement me
//Seperate string using strtok
char delim[] = " ";
int str_length = strlen(str);
char* new_str = malloc(str_length);
strcpy(new_str, str);
char* ptr = strtok(new_str, delim);
int counter = 0;
while (ptr != NULL){
if (check_string(ptr) == 0){
counter++;
}
ptr = strtok(NULL, delim);
}
free(new_str);
return counter;
}
Lastly check_string() checks if a string is valid
For example check_string("Ab") is invalid because there is a A inside.
using strtok to split "Ab + abc EF++aG hi jkL" into separate parts
int check_string(char* str){
// 0 = false
// 1 = true
int invalid_chars = 0;
for (int i = 0; i<strlen(str); i++){
int char_int_val = (int) str[i];
if (!((char_int_val >= 97 && char_int_val <= 122))){
invalid_chars = 1;
}
}
return invalid_chars;
}
Any help would be much appreciated. Thank you for reading.
If you have any questions about how the code works please ask me. Also I'm new to stackoverflow, please tell me if I have to change something.
You have a few problems in your code. First I'll repeat what I've said in the comments:
Not allocating enough space for the string copies. strlen does not include the NUL terminator in its length, so when you do
char* new_str = malloc(str_length);
strcpy(new_str, str);
new_str overflows by 1 when strcpy adds the '\0', invoking undefined behavior. You need to allocate one extra:
char* new_str = malloc(str_length + 1);
strcpy(new_str, str);
You should not free any pointer returned from strtok. You only free memory that's been dynamically allocated using malloc and friends. strtok does no such thing, so it's incorrect to free the pointer it returns. Doing so also invokes UB.
Your final problem is because of this:
// copy str to new_str, that's correct because strtok
// will manipulate the string you pass into it
strcpy(new_str, str);
// get the first token and allocate size for the number of tokens,
// so far so good (but you should check that malloc succeeded)
char* ptr = strtok(new_str, delim);
char** array_2d = malloc(sizeof(char*) *num_tokens);
while (ptr != NULL){
if (check_string(ptr) == 0){
// whoops, this is where the trouble starts ...
array_2d[index] = ptr;
index++;
}
// get the next token, this is correct
ptr = strtok(NULL, delim);
}
// ... because you free new_str
free(new_str);
ptr is a pointer to some token in new_str. As soon as you free(new_str), Any pointer pointing to that now-deallocated memory is invalid. You've loaded up array_2d with pointers to memory that's no longer allocated. Trying to access those locations again invokes undefined behavior. There's two ways I can think of off the top to solve this:
Instead of saving pointers that are offsets to new_str, find the same tokens in str (the string from main) and point to those instead. Since those are defined in main, they will exist for as long as the program exists.
Allocate some more memory, and strcpy the token into array_2d[index]. I'll demonstrate this below:
while (ptr != NULL){
if (check_string(ptr) == false)
{
// allocate (enough) memory for the pointer at index
array_2d[index] = malloc(strlen(ptr) + 1);
// you should _always_ check that malloc succeeds
if (array_2d[index] != NULL)
{
// _copy_ the string pointed to by ptr into our new space rather
// than simply assigning the pointer
strcpy(array_2d[index], ptr);
}
else { /* handle no mem error how you want */ }
index++;
}
ptr = strtok(NULL, delim);
}
// now we can safely free new_str without invalidating anything in array_2d
free(new_str);
I have a working demonstration here. Note some other changes in the demo:
#include <stdbool.h> and used that instead of 0 and 1 ints.
Changed your get_tokens function a bit to "return" the number of tokens. This is useful in main for printing them out.
Replaced the ASCII magic numbers with their characters.
Removed the useless freedPointer = NULL lines.
Changed your ints to size_t types for everything involving a size.
One final note, while this is a valid implementation, it's probably doing a bit more work than it needs to. Rather than counting the number of tokens in a first pass, then retrieving them in a second pass, you can surely do everything you want in a single pass, but I'll leave that as an exercise to you if you're so inclined.
I tried using strtok to split a string and store individual words in a 2D array.I have put printf statements and checked whether individual words are getting stored or not and saw that words are getting stored.
But the problem is while loop does not terminate at all, while executing it does not print exit statement "Hi".
I am unable to find the mistake here.
Can someone help me with this?
int wordBreak(char* str, char words[MAX_WORDS][MAX_WORD_LENGTH])
{
int x=0;
const char *delimiters = " :,;\n";
char* token = strtok(str,delimiters);
strcpy(words[x],token);
x++;
while(token!=NULL) {
token = strtok(NULL,delimiters);
strcpy(words[x],token);
x++;
}
printf("Hi\n");
return x;
}
You must always ensure the return of strtok is not NULL before calling strcpy to copy to your array. A simple rearrangement will ensure that is the case:
int wordBreak (char* str, char words[MAX_WORDS][MAX_WORD_LENGTH])
{
int x = 0;
const char *delimiters = " :,;\n";
char *token = strtok (str,delimiters);
while (x < MAX_WORDS && token != NULL) {
if (strlen (token) < MAX_WORD_LENGTH) {
strcpy (words[x++], token);
token = strtok (NULL, delimiters);
}
else
fprintf (stderr, "warning: token '%s' exeeded MAX_WORD_LENGTH - skipped\n",
token);
}
return x;
}
You also must ensure x < MAX_WORDS by adding the additional condition as shown above and a test to ensure strlen(token) < MAX_WORD_LENGTH before copying to your array.
So I'm using strtok to split an char array by " ". Then I place each word I split into a function that will determine a value for the word based on a list. However everything I place the function call in middle of the while loop to of splitting the char array it stops.
Do I have to split the array, store it in another array and then go through the second array?
p = strtok(temp, " ");
while (p != NULL) {
value = get_score(score, scoresize, p);
points = points + value;
p = strtok(NULL, " ");
}
So as long as value = get_score(score, scoresize, p); is there the while loops break after the first word.
strtok() uses a hidden state variable to keep track of the source string position. If you use strtok again, directly or indirectly in get_score(), this hidden state will be changed as to make the call p = strtok(NULL, " "); meaningless.
Do not use strtok() this way, either use the improved version strtok_r standardized in POSIX, available on many systems. Or re-implement it with strspn and strcspn:
#include <string.h>
char *my_strtok_r(char *s, char *delim, char **context) {
char *token = NULL;
if (s == NULL)
s = *context;
/* skip initial delimiters */
s += strspn(s, delim);
if (*s != '\0') {
/* we have a token */
token = s;
/* skip the token */
s += strcspn(s, delim);
if (*s != '\0') {
/* cut the string to terminate the token */
*s++ = '\0';
}
}
*context = s;
return token;
}
...
char *state;
p = my_strtok_r(temp, " ", &state);
while (p != NULL) {
value = get_score(score, scoresize, p);
points = points + value;
p = my_strtok_r(NULL, " ", &state);
}
I'm not good at using C language. Here is my dumb question. Now I am trying to get input from users, which may have spaces. And what I need to do is to split this sentence using space as delimiter and then put each fragment into char* array. Ex:
Assuming I have char* result[10];, and the input is: Good morning John. The output should be result[0]="Good"; result[1]="morning"; result[2]="John";I have already tried scanf("%[^\n]",input); and gets(input); Yet it is still hard to deal with String in C. And also I have tried strtok, but it seems that it only replaced the space by NULL. Hence the result will be GoodNULLmorningNULLJohn. Obviously it's not what I want. Please help my dumb question. Thanks.
Edit:
This is what I don't understand when using strtok. Here is a test code.
The substr still displayed Hello there. It seems subtok only replace a null at the space position. Thus, I can't use the substr in an if statement.
int main()
{
int i=0;
char* substr;
char str[] = "Hello there";
substr = strtok(str," ");
if(substr=="Hello"){
printf("YES!!!!!!!!!!");
}
printf("%s\n",substr);
for(i=0;i<11;i++){
printf("%c", substr[i]);
}
printf("\n");
system("pause");
return 0;
}
Never use gets, is deprecated in C99 and removed from C11.
IMO, scanf is not a good function to use when you don't know the number of elements before-hand, I suggest fgets:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[128];
char *ptr;
fgets(str, sizeof str, stdin);
/* Remove trailing newline */
ptr = strchr(str, '\n');
if (ptr != NULL) {
*ptr = '\0';
}
/* Tokens */
ptr = strtok(str, " ");
while (ptr != NULL) {
printf("%s\n", ptr);
ptr = strtok(NULL, " ");
}
return 0;
}
gets is not recommended to use, as there is no way to tell the size of the buffer. fgets is ok here because it will stop reading when the 1st new line is encountered. You could use strtok to store all the splited words in to an array of strings, for example:
#include <stdio.h>
#include <string.h>
int main(void) {
char s[256];
char *result[10];
fgets(s, sizeof(s), stdin);
char *p = strtok(s, " \n");
int cnt = 0;
while (cnt < (sizeof result / sizeof result[0]) && p) {
result[cnt++] = p;
p = strtok(NULL, " \n");
}
for (int i = 0; i < cnt; i++)
printf("%s\n", result[i]);
return 0;
}
As most of the other answers haven't covered another thing you were asking:
strtok will not allocate temporary memory and will use your given string to replace every separator with a zero termination. This is why Good morning John becomes GoodNULLmorningNULLJohn. If it wouldn't do this, each token would print the whole rest of the string on its tail like:
result[0] = Good morning John
result[1] = morning John
result[2] = John
So if you want to keep your original input and an array of char* per word, you need 2 buffers. There is no other way around that. You also need the token buffer to stay in scope as long as you use the result array of char* pointers, else that one points to invalid memory and will cause undefined behavior.
So this would be a possible solution:
int main()
{
const unsigned int resultLength = 10;
char* result[resultLength];
memset(result, 0, sizeof result); // we should also zero the result array to avoid access violations later on
// Read the input from the console
char input[256];
fgets(input, sizeof input, stdin);
// Get rid of the newline char
input[strlen(input) - 1] = 0;
// Copy the input string to another buffer for your tokens to work as expected
char tokenBuffer[256];
strcpy(tokenBuffer, input);
// Setting of the pointers per word
char* token = strtok(tokenBuffer, " ");
for (unsigned int i = 0; token != NULL && i < resultLength; i++)
{
result[i] = token;
token = strtok(NULL, " ");
}
// Print the result
for (unsigned int i = 0; i < resultLength; i++)
{
printf("result[%d] = %s\n", i, result[i] != NULL ? result[i] : "NULL");
}
printf("The input is: %s\n", input);
return 0;
}
It prints:
result[0] = Good
result[1] = morning
result[2] = John
result[3] = NULL
result[4] = NULL
result[5] = NULL
result[6] = NULL
result[7] = NULL
result[8] = NULL
result[9] = NULL
The input is: Good morning John
I'm trying to tokenize a string in C based upon \r\n delimiters, and want to print out each string after subsequent calls to strtok(). In a while loop I have, there is processing done to each token.
When I include the processing code, the only output I receive is the first token, however when I take the processing code out, I receive every token. This doesn't make sense to me, and am wondering what I could be doing wrong.
Here's the code:
#include <stdio.h>
#include <time.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
int c = 0, c2 = 0;
char *tk, *tk2, *tk3, *tk4;
char buf[1024], buf2[1024], buf3[1024];
char host[1024], path[1024], file[1024];
strcpy(buf, "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n");
tk = strtok(buf, "\r\n");
while(tk != NULL)
{
printf("%s\n", tk);
/*
if(c == 0)
{
strcpy(buf2, tk);
tk2 = strtok(buf2, "/");
while(tk2 != NULL)
{
if(c2 == 1)
strcpy(path, tk2);
else if(c2 == 2)
{
tk3 = strtok(tk2, " ");
strcpy(file, tk3);
}
++c2;
tk2 = strtok(NULL, "/");
}
}
else if(c == 1)
{
tk3 = strtok(tk, " ");
while(tk3 != NULL)
{
if(c2 == 1)
{
printf("%s\n", tk3);
// strcpy(host, tk2);
// printf("%s\n", host);
}
++c2;
tk3 = strtok(NULL, " ");
}
}
*/
++c;
tk = strtok(NULL, "\r\n");
}
return 0;
}
Without those if else statements, I receive the following output...
GET /~yourloginid/index.htm HTTP/1.1
Host: remote.cba.csuohio.edu
...however, with those if else statements, I receive this...
GET /~yourloginid/index.htm HTTP/1.1
I'm not sure why I can't see the other token, because the program ends, which means that the loop must occur until the end of the entire string, right?
strtok stores "the point where the last token was found" :
"The point where the last token was found is kept internally by the function to be used on the next call (particular library implementations are not required to avoid data races)."
-- reference
That's why you can call it with NULL the second time.
So your calling it again with a different pointer inside your loop makes you loose the state of the initial call (meaning tk = strtok(NULL, "\r\n") will be NULL by the end of the while, because it will be using the state of the inner loops).
So the solution is probably to change the last line of the while from:
tk = strtok(NULL, "\r\n");
to something like (please check the bounds first, it should not go after buf + strlen(buf)):
tk = strtok(tk + strlen(tk) + 1, "\r\n");
Or use strtok_r, which stores the state externally (like in this answer).
// first call
char *saveptr1;
tk = strtok_r(buf, "\r\n", &saveptr1);
while(tk != NULL) {
//...
tk = strtok_r(NULL, "\r\n", &saveptr1);
}
strtok stores the state of the last token in a global variable, so that the next call to strtok knows where to continue. So when you call strtok(buf2, "/"); in the if, it clobbers the saved state about the outser tokenization.
The fix is to use strtok_r instead of strtok. This function takes an extra argument that is used to store the state:
char *save1, *save2, *save3;
tk = strtok_r(buf, "\r\n", &save1);
while(tk != NULL) {
printf("%s\n", tk);
if(c == 0) {
strcpy(buf2, tk);
tk2 = strtok_r(buf2, "/", &save2);
while(tk2 != NULL) {
if(c2 == 1)
strcpy(path, tk2);
else if(c2 == 2) {
tk3 = strtok_r(tk2, " ", &save3);
strcpy(file, tk3); }
++c2;
tk2 = strtok_r(NULL, "/", &save2); }
} else if(c == 1) {
tk3 = strtok_r(tk, " ", &save2);
while(tk3 != NULL) {
if(c2 == 1) {
printf("%s\n", tk3);
// strcpy(host, tk2);
// printf("%s\n", host);
}
++c2;
tk3 = strtok_r(NULL, " ", &save2); } }
++c;
tk = strtok_r(NULL, "\r\n", &save1); }
return 0;
}
One thing that stands out to me is that unless you are doing something else with the string buffer, there is no need to copy each token to its own buffer. The strtok function returns a pointer to the beginning of the token, so you can use the token in place. The following code may work better and be easier to understand:
#define MAX_PTR = 4
char buff[] = "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n";
char *ptr[MAX_PTR];
int i;
for (i = 0; i < MAX_PTR; i++)
{
if (i == 0) ptr[i] = strtok(buff, "\r\n");
else ptr[i] = strtok(NULL, "\r\n");
if (ptr[i] != NULL) printf("%s\n", ptr[i]);
}
The way that I defined the buffer is something that I call a pre-loaded buffer. You can use an array that is set equal to a string to initialize the array. The compiler will size it for you without you needing to do anything else. Now inside the for loop, the if statement determines which form of strtok is used. So if i == 0, then we need to initialize strtok. Otherwise, we use the second form for all subsequent tokens. Then the printf just prints the different tokens. Remember, strtok returns a pointer to a spot inside the buffer.
If you really are doing something else with the data and you really do need the buffer for other things, then the following code will work as well. This uses malloc to allocate blocks of memory from the heap.
#define MAX_PTR = 4
char buff[] = "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n";
char *ptr[MAX_PTR];
char *bptr; /* buffer pointer */
int i;
for (i = 0; i < MAX_PTR; i++)
{
if (i == 0) bptr = strtok(buff, "\r\n");
else bptr = strtok(NULL, "\r\n");
if (bptr != NULL)
{
ptr[i] = malloc(strlen(bptr + 2));
if (ptr[i] == NULL)
{
/* Malloc error check failed, exit program */
printf("Error: Memory Allocation Failed. i=%d\n", i);
exit(1);
}
strncpy(ptr[i], bptr, strlen(bptr) + 1);
ptr[i][strlen(bptr) + 1] = '\0';
printf("%s\n", ptr[i]);
}
else ptr[i] = NULL;
}
This code does pretty much the same thing, except that we are copying the token strings into buffers. Note that we use an array of char pointers to do this. THe malloc call allocates memory. Then we check if it fails. If malloc returns a NULL, then it failed and we exit program. The strncpy function should be used instead of strcpy. Strcpy does not allow for checking the size of the target buffer, so a malicious user can execute a buffer overflow attack on your code. The malloc was given strlen(bptr) + 2. This is to guarantee that the size of the buffer is big enough to handle the size of the token. The strlen(bptr) + 1 expressions are to make sure that the copied data doesn't overrun the buffer. As an added precaution, the last byte in the buffer is set to 0x00. Then we print the string. Note that I have the if (bptr != NULL). So the main block of code will be executed only if strtok returns a pointer to a valid string, otherwise we set the corresponding pointer entry in the array to NULL.
Don't forget to free() the pointers in the array when you are done with them.
In your code, you are placing things in named buffers, which can be done, but it's not really good practice because then if you try to use the code somewhere else, you have to make extensive modifications to it.