I was having a problem with my function yesterday that turned out to be a simple fix, and now I am have a different issue with it that is baffling me. The function is a tokenizer:
void tokenizer(FILE *file, struct Set *set) {
int nbytes = 100;
int bytesread;
char *buffer;
char *token;
buffer = (char *) malloc(nbytes + 1);
while((bytesread = getLine(&buffer,&nbytes,file)) != -1) {
token = strtok(buffer," ");
while(token != NULL) {
add(set,token);
token = strtok(NULL," ");
}
}
}
I know that the input(a text file) is getting broken into tokens correctly because in the second while loop I can add printf("%s",token) to display each token. However, the problem is with add. It is only adding the first token to my list, but at the same time still breaks every token down correctly. For example, if my input text was "blah herp derp" I would get
token = blah
token = herp
token = derp
but the list would only contain the first token, blah. I don't believe that the problem is with add because it works on its own, ie I could use
add(set,"blah");
add(set,"herp");
add(set,"derp");
and the result will be a list that contains all three words.
Thank you for any help!
strtok() returns a pointer into the string buffer. You need to strdup() that string and add the result to your tree instead. Don't forget to free() it when you clean up the tree.
Related
Hi im trying to use strtok to seperate a file read.
the text file just has a list of names read into one char array at first stored in to data.
ive removed the file reading bit and shown the array for simplicity.
int main()
{
struct Node* head = NULL;
char data[128] = "john smith\nbob jones\nrobert brown";
char *argv [50];
char * token = strtok(data, "\n"); // separates data into lines
while( token != NULL )
{
insertAtBeginning(&head, token); //LL the data gets stored in
token = strtok(NULL, "\n");
}
}
I have managed to split the data into lines, however i want to split the one line - token into an array by the " ".
so i want argv[0] = "john" and argv[1] = smith.
this argv array then gets stored into the linked list instead of "token" at the line.
thanks any help will be much appreciated.
I have created a program that requires reading a CSV file that contains bank accounts and transaction history. To access certain information, I have a function getfield which reads each line token by token:
const char* getfield(char* line, int num)
{
const char *tok;
for (tok = strtok(line, ",");
tok && *tok;
tok = strtok(NULL, ",\n"))
{
if (!--num)
return tok;
}
return NULL;
}
I use this later on in my code to access the account number (at position 2) and the transaction amount(position 4):
...
while (fgets(line, 1024, fp))
{
char* tmp = strdup(line);
//check if account number already exists
char *acc = (char*) getfield(tmp, 2);
char *txAmount = (char*)getfield(tmp, 4);
printf("%s\n", txAmount);
//int n =1;
if (acc!=NULL && atoi(acc)== accNum && txAmount !=NULL){
if(n<fileSize)
{
total[n]= (total[n-1]+atof(txAmount));
printf("%f", total[n]);
n++;
}
}
free(tmp1); free(tmp2);
}
...
No issue seems to arise with char *acc = (char*) getfield(tmp, 2), but when I use getfield for char *txAmount = (char*)getfield(tmp, 4) the print statement that follows shows me that I always have NULL. For context, the file currently reads as (first line is empty):
AC,1024,John Doe
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
I had previously asked if it was required to use free(acc) in a separate part of my code (Free() pointer error while casting from const char*) and the answer seemed to be no, but I'm hoping this question gives better context. Is this a problem with not freeing up txAmount? Any help is greatly appreciated !
(Also, if anyone has a better suggestion for the title, please let me know how I could have better worded it, I'm pretty new to stack overflow)
Your getfield function modifies its input. So when you call getfield on tmp again, you aren't calling it on the right string.
For convenience, you may want to make a getfield function that doesn't modify its input. It will be inefficient, but I don't think performance or efficiency are particularly important to your code. The getfield function would call strdup on its input, extract the string to return, call strdup on that, free the duplicate of the original input, and then return the pointer to the duplicate of the found field. The caller would have to free the returned pointer.
The issue is that strtok replaces the found delimiters with '\0'. You'll need to get a fresh copy of the line.
Or continue where you left off, using getfield (NULL, 2).
I was having some issues dealing with char*'s from an array of char*'s and used this for reference: Splitting C char array into words
So what I'm trying to do is read in char arrays and split them with a space delimiter so I can do stuff with it. For example if the first token in my char* is "Dog" I would send it to a different function that dealt with dogs. My problem is that I'm getting a strange output.
For example:
INPUT: *cmd = "Dog needs a vet appointment."
OUTPUT: (from print statements) "Doneeds a vet appntment."
I've checked for memory leaks using valgrind and I have none of them or other errors.
void parseCmd(char* cmd){ //passing in an individual char* from a char**
char** p_args = calloc(100, sizeof(char*));
int i = 0;
char* token;
token = strtok(cmd, " ");
while (token != NULL){
p_args[i++] = token;
printf("%s",token); //trying to debug
token = strtok(NULL, cmd);
}
free(p_args);
}
Any advice? I am new to C so please bear with me if I did something stupid. Thank you.
In your case,
token = strtok(NULL, cmd);
is not what you should be doing. You instead need:
token = strtok(NULL, " ");
As per the ISO standard:
char *strtok(char * restrict s1, const char * restrict s2);
A sequence of calls to the strtok function breaks the string pointed to by s1 into a sequence of tokens, each of which is delimited by a character from the string pointed to by s2.
The only difference between the first and subsequent calls (assuming, as per this case, you want the same delimiters) should be using NULL as the input string rather than the actual string. By using the input string as the delimiter list in subsequent calls, you change the behaviour.
You can see exactly what's happening if you try the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void parseCmd(char* cmd) {
char* token = strtok(cmd, " ");
while (token != NULL) {
printf("[%s] [%s]\n", cmd, token);
token = strtok(NULL, cmd);
}
}
int main(void) {
char x[] = "Dog needs a vet appointment.";
parseCmd(x);
return 0;
}
which outputs (first column will be search string to use next iteration, second is result of this iteration):
[Dog] [Dog]
[Dog] [needs a vet app]
[Dog] [intment.]
The first step worked fine since you were using space as the delimiter and it modified the string by placing a \0 at the end of Dog.
That means the next attempt (with the wrong spearator) would use one of the letters from {D,o,g} to split. The first matching letter for that set is the o in appointment which is why you see needs a vet app. The third attempt finds none of the candidate letters so you just get back the rest of the string, intment..
token = strtok(NULL, cmd); should be token = strtok(NULL, " ");.
The second argument is for delimiter.
http://man7.org/linux/man-pages/man3/strtok.3.html
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm having a very hard time tracing down a strange error in my pebble watch app. I suspect it's a memory error but I can't find my mistake. I have an array of strings, and when I call menu_layer = menu_layer_create(bounds); it somehow seems to be corrupting my string. I explain more fully below. The description of the error is bolded.
I have a header files which declares three variables as extern so that they're global.
//externs.h
#ifndef EXTERNS_H
#define EXTERNS_H
// These global variables are accessible by all source files. Modified in main.c
extern int str_count;
extern char **str_titles;
extern char **str_teasers;
#endif
These three variables are modified in main.c, but are used in other c files. Below is a sample of my main.c file where I set the array of strings str_titles and str_teasers. The structure info contains two strings which are very long but are separated with the delimiter |. I copy these strings into a temporary buffer s_buffer and parse them apart with strtok, saving each new string into my array of strings.
This seems to work well, I've checked each string in a for loop and the last character is always the null-terminated byte and the one previous to it is a period (end of the sentence). I don't modify these values anywhere else, and I don't free them until the very very end of my program (because they should exist for the program's life).
I create a menu with a dynamic number of entries (in this case 6), and each menu has the title of one of the 6 strings within str_titles. There are no issues here. I can iterate through and APP_LOG this array of strings with no problem at any time in my program.
When each menu item is pressed, it should display the longer string from str_teasers in a scroll layer. It does this reliably only for the first three menu items. For the last three it is always blank. Trying to iterate and print the array of strings here with APP_LOG produces a litany of errors in the python framework used with the pebble logs and always ends with something like:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x98 in position 0: invalid start byte
The last part invalid start byte is sometimes something different, and the decode bytes and position changes based on the string. Note that in the logs always produces some error for the last three blank menu items, and only sometimes does it produce the error for the first three even if the scroll layer is not blank and displays the correct text (for the first three sometimes it prints to the logs without issue).
I've tried APP_LOG on str_teasers at various points, and when it fails, it does so after I call menu_layer = menu_layer_create(bounds); to create my menu. Before I make that call I can print all the strings with APP_LOG with no issue at all. I thought it was a heap corruption error, but I have available heap memory (~8100 Bytes) before and after the creation of the layer and my app doesn't crash.
Maybe I'm missing something very simple, but I can't find my mistake. I've allocated the memory for str_teasers correctly I believe, so I don't see why it should be modified at all. I've included a modified sample code below for reference.
//main.c
#include "strtok.h"
#include "externs.h"
int str_count;
char **str_titles;
char **str_teasers;
char *s_buffer;
const char delim[1] = "|";
char *token;
typedef struct {
int s_count;
char* s_titles;
char* s_teasers;
} s_info;
s_info info;
// Sample code
str_count = info.s_count;
// Declare arrays of appropriate size
str_titles = malloc(str_count * sizeof(char*));
str_teasers = malloc(str_count * sizeof(char*));
// This creates a copy of the entire string s_titles into s_buffer
int len = strlen(info.s_titles) + 1;
s_buffer = (char *)malloc(len);
strcpy(s_buffer, info.s_titles);
token = strtok(s_buffer, delim); // Get the first token for the titles
// Walk through the other tokens
int counter = 0;
while(token != NULL) {
*(str_titles + counter) = malloc((strlen(token) + 1) * sizeof(char));
*(str_titles + counter) = token;
token = strtok(NULL, delim);
counter++;
}
// This creates a copy of the entire string s_teasers into s_buffer
len = strlen(info.s_teasers) + 1;
s_buffer = (char *)realloc(s_buffer, len);
strcpy(s_buffer, info.s_teasers);
token = strtok(s_buffer, delim); // Get the first token for the teasers
// Walk through the other tokens
counter = 0;
while(token != NULL) {
*(str_teasers + counter) = malloc((strlen(token) + 1) * sizeof(char));
*(str_teasers + counter) = token;
token = strtok(NULL, delim);
counter++;
}
free(s_buffer);
In this code block
// Walk through the other tokens
int counter = 0;
while(token != NULL) {
*(str_titles + counter) = malloc((strlen(token) + 1) * sizeof(char));
*(str_titles + counter) = token;
token = strtok(NULL, delim);
counter++;
}
you allocate memory but immediately overwrite the pointer with the token pointer. Then, in the other, similar code block, you have overwritten the string into which these point. You might have been "lucky" since the memory was reallocated, and the previous string memory was still untouched by other processes. And indeed, you end by freeing this string buffer anyway, without having copied the tokens.
I think you need a strcpy here
// Walk through the other tokens
int counter = 0;
while(token != NULL) {
*(str_titles + counter) = malloc((strlen(token) + 1) * sizeof(char));
strcpy(*(str_titles + counter), token); // <<--- here
token = strtok(NULL, delim);
counter++;
}
Similarly for the teasers code block.
I also notice that you index by counter but have not checked it against the str_count you were supplied with, which was used to allocate memory for the array of string pointers.
My application produces strings like the one below. I need to parse values between the separator into individual values.
2342|2sd45|dswer|2342||5523|||3654|Pswt
I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.
token = (char *)strtok(strAccInfo, "|");
for (iLoop=1;iLoop<=106;iLoop++) {
token = (char *)strtok(NULL, "|");
}
Any suggestions?
In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).
It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)
That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.
On a first call, the function expects
a C string as argument for str, whose
first character is used as the
starting location to scan for tokens.
In subsequent calls, the function
expects a null pointer and uses the
position right after the end of last
token as the new starting location for
scanning.
To determine the beginning and the end
of a token, the function first scans
from the starting location for the
first character not contained in
delimiters (which becomes the
beginning of the token). And then
scans starting from this beginning of
the token for the first character
contained in delimiters, which becomes
the end of the token.
What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.
Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.
char *mystrtok(char **m,char *s,char c)
{
char *p=s?s:*m;
if( !*p )
return 0;
*m=strchr(p,c);
if( *m )
*(*m)++=0;
else
*m=p+strlen(p);
return p;
}
reentrant
threadsafe
strictly ANSI conform
needs an unused help-pointer from calling
context
e.g.
char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
puts(t);
e.g.
char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
char *p1,*t1;
for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
puts(t1);
}
your work :)
implement char *c as parameter 3
Look into using strsep instead: strsep reference
Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:
// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) {
static char *current; // just as ugly as strtok!
char *pos, *ret;
if (input != NULL)
current = input;
if (current == NULL)
return current;
ret = current;
pos = strpbrk(current, delim);
if (pos == NULL)
current = NULL;
else {
*pos = '\0';
current = pos+1;
}
return ret;
}
Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '\0';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
you can use it like
char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
This suppose to output
1
2
3
5
I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it
Below is the solution that is working for me now. Thanks to all of you who responded.
I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.
char strAccInfo[1024], *p2;
int iLoop;
Action() { //This value would come from the wrsp call in the actual script.
lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");
//Store the parameter into a string - saves memory.
strcpy(strAccInfo,lr_eval_string("{test_Param}"));
//Get the first instance of the separator "|" in the string
p2 = (char *) strchr(strAccInfo,'|');
//Start a loop - Set the max loop value to more than max expected.
for (iLoop = 1;iLoop<200;iLoop++) {
//Save parameter names in sequence.
lr_param_sprintf("Param_Name","Parameter_%d",iLoop);
//Get the first instance of the separator "|" in the string (within the loop).
p2 = (char *) strchr(strAccInfo,'|');
//Save the value for the parameters in sequence.
lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));
//Save string after the first instance of p2, as strAccInfo - for looping.
strcpy(strAccInfo,p2+1);
//Start conditional loop for checking for last value in the string.
if (strchr(strAccInfo,'|')==NULL) {
lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
iLoop = 200;
}
}
}