I'm using strtok() in c to parse a csv string. First I tokenize it to just find out how many tokens there are so I can allocate a string of the correct size. Then I go through using the same variable I used last time for tokenization. Every time I do it a second time though it strtok(NULL, ",") returns NULL even though there are still more tokens to parse. Can somebody tell me what I'm doing wrong?
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
//allocate array
tok = strtok(buffer, ",");
while(tok != NULL) {
//do other stuff
tok = strtok(NULL, ",");
}
So on that second while loop it always ends after the first token is found even though there are more tokens. Does anybody know what I'm doing wrong?
strtok() modifies the string it operates on, replacing delimiter characters with nulls. So if you want to use it more than once, you'll have to make a copy.
There's not necessarily a need to make a copy - strtok() does modify the string it's tokenizing, but in most cases that simply means the string is already tokenized if you want to deal with the tokens again.
Here's your program modified a bit to process the tokens after your first pass:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int i;
char buffer[] = "some, string with , tokens";
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
// walk through the tokenized buffer again
tok = buffer;
for (i = 0; i < count; ++i) {
printf( "token %d: \"%s\"\n", i+1, tok);
tok += strlen(tok) + 1; // get the next token by skipping past the '\0'
tok += strspn(tok, ","); // then skipping any starting delimiters
}
return 0;
}
Note that this is unfortunately trickier than I first posted - the call to strspn() needs to be performed after skipping the '\0' placed by strtok() since strtok() will skip any leading delimiter characters for the token it returns (without replacing the delimiter character in the source).
Use strsep - it actually updates your pointer. In your case you would have to keep calling NULL versus passing in the address of your string. The only issue with strsep is if it was previously allocated on the heap, keep a pointer to the beginning and then free it later.
char *strsep(char **string, char *delim);
char *string;
char *token;
token = strsep(&string, ",");
strtok is used in your normal intro to C course - use strsep, it's much better. :-)
No getting confused on "oh shit - i have to pass in NULL still cuz strtok screwed up my positioning."
Related
I am getting this error:
Error in `./sorter': double free or corruption (!prev): 0x0000000000685010
and then a bunch of numbers which is the memory map.
My program reads a CSV file of movies and their attributes from stdin and tokenizes it. The titles of the movies with commas in them are surrounded in quotes, so I split each line into 3 tokens and tokenize the front and back token again using the comma as the delimeter. I free all my mallocs at the end of the code but I still get this error. The csv is scanned until the end but I get an the error message. If I don't free the mallocs at all I don't get an error message but I highly doubt it is right. This is my main() :
char* CSV = (char*)malloc(sizeof(char)*500);
char* fronttoken = (char*)malloc(sizeof(char)*500);
char* token = (char*)malloc(sizeof(char)*500);
char* backtoken = (char*)malloc(sizeof(char)*500);
char* title = (char*)malloc(sizeof(char)*100);
while(fgets(CSV, sizeof(CSV)*500,stdin))
{
fronttoken = strtok(CSV, "\""); //store token until first quote, if no quote, store whole line
title = strtok(NULL,"\""); //store token after first quote until 2nd quote
if(title != NULL) //if quotes in line, consume comma meant to delim title
{
token = strtok(NULL, ","); //eat comma
}
backtoken = strtok(NULL,"\n"); //tokenize from second quote to \n character (remainder of line)
printf("Front : %s\nTitle: %s\nBack: %s\n", fronttoken, title, backtoken); //temp print statement to see front,back,title components
token = strtok(fronttoken, ","); //tokenizing front using comma delim
while (token != NULL)
{
printf("%s\n", token);
token = strtok(NULL, ",");
}
if (title != NULL) //print if there is a title with comma
{
printf("%s\n",title);
}
token = strtok(backtoken,","); //tokenizing back using comma delim
while (token != NULL)
{
printf("%s\n", token);
token = strtok(NULL, ",");
}
}
free(CSV);
free(token);
free(fronttoken);
free(backtoken);
free(title);
return 0;
Focus here:
char* title = (char*)malloc(sizeof(char)*100);
title = strtok(NULL,"\"");
You dynamically allocate memory that title points to.
You assign the return value of strtok to title, losing any
reference to the memory dynamically allocated with malloc()! This
means that you will definetely have a memory leak, since you will
never be able to de-allocate the memory you dynamically allocated
before.
The ref's example of strtok() has a very informative example:
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
As a result, there is no need to allocate memory for what strtok() returns - it's actually bad as I explained before.
Back to your code:
free(title);
does nothing, since title is NULL at that point (because of the while loop after strtok().
Same with token.
Furthermore, fronttoken and backtoken also result in memory leaks, since they are assigned the return value of strtok(), after malloc() has been called. But their free() is problematic too (in contrast with the other de-allocations of title and token), since they point within the original memory allocated for CSV.
So, when free(backtoken); is called, double-free or memory corruption occurs.
Moreover, change this:
while(fgets(CSV, sizeof(CSV)*500,stdin))
to this:
while(fgets(CSV, sizeof(*CSV)*500,stdin))
since you want the size of where CSV points to (that's the size of the memory you dynamically allocated).
I have a problem with regards to separating the contents of a string passed to a function. The function is called with a string like this:
ADD:Nathaniel:50
Where ADD will be the protocol name, Nathaniel will be the key, and 50 will be the value, all separated with a :.
My code looks like this:
bool add_to_list(char* buffer){
char key[40];
char value[40];
int* token;
char buffer_copy[1024];
const char delim[2] = ":";
strcpy(buffer_copy, buffer);
token = strtok(NULL, delim);
//strcpy(key, token);
printf("%d",token);
printf("%p",token);
while(token != NULL){
token = strtok (NULL, delim);
}
//strcpy(value, token);
printf("%s", key);
printf("%s", value);
push(key, value);
return true;
}
What I am trying to do is store each key and value in a separate variable, using strtok(). Note that I am trying to store the second and third values (Nathaniel and 50) not the first bit (ADD).
When I run the code, it gives me a segmentation fault, so I am guessing that I am trying to access an invalid memory address rather than a value. I just need to store the second and third bit of the string. Can anyone help please?
EDIT:
I have changed the code to look like this:
bool add_to_list(char* buffer){
char *key, *value, *token;
const char *delim = ":";
token = strtok(buffer, delim);
//printf("%d",token);
printf("%s",token);
key = strtok(NULL, delim);
value = strtok(NULL, delim);
printf("%s", key);
printf("%s", value);
//push(key, value);
return true;
}
But I am still getting the same segmentation fault (core dumped) error
The first call to strtok() needs to provide the string to scan. You only use NULL on the repeated calls, so it will keep processing the rest of the string. SO the first call should be:
token = strtok(buffer_copy, delim);
Then when you want to get the key and value, you need to copy them to the arrays:
token = strtok(NULL, delim);
key = strcpy(token);
token = strtok(NULL, delim);
value = strcpy(token);
You don't need a loop, since you just want to extract these two values.
Actually, you don't need to declare key and value as arrays, you could use pointers:
char *key, *value;
Then you can do:
token = strtok(buffer_copy, delim);
key = strtok(NULL, delim);
value = strtok(NULL, delim);
Your main problem is that when you first call strtok, the first parameter should be the string you want to parse, so not:
strcpy(buffer_copy, buffer);
token = strtok(NULL, delim);
but
strcpy(buffer_copy, buffer);
token = strtok(buffer_copy, delim);
Additionally when you detect the tokens in your while loop, you are throwing them away. You want to do something at that point (or simply unroll the loop and call strtok three times).
Also:
const char* delim = ":";
would be a more conventional way of ensuring a NUL terminated string than:
const char delim[2] = ":";
Also consider using strtok_r not strtok as strtok is not thread-safe and horrible. Whilst you are not using threads here (it seems), you might as well get into good practice.
I have this code in my program:
char* tok = NULL;
char move[100];
if (fgets(move, 100, stdin) != NULL)
{
/* then split into tokens using strtok */
tok = strtok(move, " ");
while (tok != NULL)
{
printf("Element: %s\n", tok);
tok = strtok(NULL, " ");
}
}
I have tried adding printf statements before and after fgets, and the one before gets printed, but the one after does not.
I cannot see why this fgets call is causing a segmentation failure.
If someone has any idea, I would much appreciate it.
Thanks
Corey
The strtok runtime function works like this
the first time you call strtok you provide a string that you want to tokenize
char s[] = "this is a string";
in the above string space seems to be a good delimiter between words so lets use that:
char* p = strtok(s, " ");
what happens now is that 's' is searched until the space character is found, the first token is returned ('this') and p points to that token (string)
in order to get next token and to continue with the same string NULL is passed as first argument since strtok maintains a static pointer to your previous passed string:
p = strtok(NULL," ");
p now points to 'is'
and so on until no more spaces can be found, then the last string is returned as the last token 'string'.
more conveniently you could write it like this instead to print out all tokens:
for (char *p = strtok(s," "); p != NULL; p = strtok(NULL, " "))
{
puts(p);
}
EDITED HERE:
If you want to store the returned values from strtok you need to copy the token to another buffer e.g. strdup(p); since the original string (pointed to by the static pointer inside strtok) is modified between iterations in order to return the token.
I am trying to work with strtok and strcat but the second printf never shows up. Here is the code:
int i = 0;
char *token[128];
token[i] = strtok(tmp, "/");
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(token[0], token[i]);
printf("%s", token[i]);
i++;
}
If my input is 1/2/3/4/5/6 for tmp then the console output would be 13456. The 2 is always missing. Does anyone know how to fix this?
The two is always missing because on the first iteration of your loop you overwrite it with the call to strcat.
After entry to the loop your buffer contains: "1\02\03/4/5/6" internal strtok pointer is pointing to "3". tokens[1] points to "2".
You then call strcat: "12\0\03/4/5/6" so your token[i] pointer is pointing to "\0". The first print prints nothing.
Subsequent calls are OK because the null characters do not overwrite the input data.
To fix it you should build up your output string into a second buffer, not the one you are parsing.
A working(?) version:
#include <stdio.h>
#include <string.h>
int main(void)
{
int i = 0;
char *token[128];
char tmp[128];
char removed[128] = {0};
strcpy(tmp, "1/2/3/4/5/6");
token[i] = strtok(tmp, "/");
strcat(removed, token[i]);
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(removed, token[i]);
printf("%s", token[i]);
i++;
}
return (0);
}
strtok modifies the input string in place and returns pointers to that string. You then take one of those pointers (token[0]) and pass it to another operation (strcat) that writes to that pointer. The writes are clobbering each other.
If you want to concatenate all the tokens, you should allocate a separate char* to strcpy to.
I've been reading up on strtok and thought it would be the best way for me to compare two files word by word. So far i can't really figure out how i would do it though
Here is my function that perfoms it:
int wordcmp(FILE *fp1, FILE *fp2)
{
char *s1;
char *s2;
char *tok;
char *tok2;
char line[BUFSIZE];
char line2[BUFSIZE];
char comp1[BUFSIZE];
char comp2[BUFSIZE];
char temp[BUFSIZE];
int word = 1;
size_t i = 0;
while((s1 = fgets(line,BUFSIZE, fp1)) && (s2 = fgets(line2,BUFSIZE, fp2)))
{
;
}
tok = strtok(line, " ");
tok2 = strtok(line, " ");
while(tok != NULL)
{
tok = strtok (NULL, " ");
}
return 0;
}
Don't mind the unused variables, I've been at this for 3 hours and have tried all possible ways I can think of to compare the values of the first and second strtok. Also I would to know how i would check which file reaches EOF first.
when i tried
if(s1 == EOF && s2 != EOF)
{
return -1;
}
It returns -1 even when the files are the same! Is it because in order for it to reach the if statement outside of the loop both files have reached EOF which makes the program always go to this if statement?
Thanks in advance!
If you want to check if files are same try doing,
do {
s1 = fgetc(fp1);
s2 = fgetc(fp2);
if (s1 == s2) {
if (s1 == EOF) {
return 1; // RETURN TRUE
}
continue;
}
else {
return -1; // RETURN FALSE
}
} while (1);
Good Luck :)
When you use strtok() you typically use code like this:
tok = strtok(line, " ");
while (NULL != tok)
{
tok = strtok(NULL, " ");
}
The NULL in the call in the loop tells strtok to continue from after the previously found token until it finds the null terminating character in the value you originally passed (line) or until there are no more tokens. The current pointer is stored in the run time library, and once strtok() returns NULL to indicate no more tokens any more calls to strtok() using NULL as the first parameter (to continue) will result in NULL. You need to call it with another value (e.g. another call to strtok(line, " ")) to get it to start again.
What this means is that to use strtok on two different strings at the same time you need to manually update the string position and pass in a modified value on each call.
tok = strtok(line, " ");
tok2 = strtok(line2, " ");
while (NULL != tok && NULL != tok2)
{
/* Do stuff with tok and tok2 here */
if (strcmp(tok, tok2)... {}
/* Update strtok pointers */
tok += strlen(tok) + 1;
tok2 += strlen(tok2) + 1;
/* Get next token */
tok = strtok(tok, " ");
tok2 = strtok(tok2, " ");
}
You'll still need to add logic for determining whether lines are different - you've not said whether the files are equivalent if a line break occurs at different position but the words surrounding it are the same. I assume it should be, given your description, but it makes the logic more awkward as you only need to perform the initial fgets() and strtok() for a file if you don't already have a token. You also need to look at how files are read in. Currently your first while loop just reads lines until the end of the file without processing them.