I have created a program that requires reading a CSV file that contains bank accounts and transaction history. To access certain information, I have a function getfield which reads each line token by token:
const char* getfield(char* line, int num)
{
const char *tok;
for (tok = strtok(line, ",");
tok && *tok;
tok = strtok(NULL, ",\n"))
{
if (!--num)
return tok;
}
return NULL;
}
I use this later on in my code to access the account number (at position 2) and the transaction amount(position 4):
...
while (fgets(line, 1024, fp))
{
char* tmp = strdup(line);
//check if account number already exists
char *acc = (char*) getfield(tmp, 2);
char *txAmount = (char*)getfield(tmp, 4);
printf("%s\n", txAmount);
//int n =1;
if (acc!=NULL && atoi(acc)== accNum && txAmount !=NULL){
if(n<fileSize)
{
total[n]= (total[n-1]+atof(txAmount));
printf("%f", total[n]);
n++;
}
}
free(tmp1); free(tmp2);
}
...
No issue seems to arise with char *acc = (char*) getfield(tmp, 2), but when I use getfield for char *txAmount = (char*)getfield(tmp, 4) the print statement that follows shows me that I always have NULL. For context, the file currently reads as (first line is empty):
AC,1024,John Doe
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
I had previously asked if it was required to use free(acc) in a separate part of my code (Free() pointer error while casting from const char*) and the answer seemed to be no, but I'm hoping this question gives better context. Is this a problem with not freeing up txAmount? Any help is greatly appreciated !
(Also, if anyone has a better suggestion for the title, please let me know how I could have better worded it, I'm pretty new to stack overflow)
Your getfield function modifies its input. So when you call getfield on tmp again, you aren't calling it on the right string.
For convenience, you may want to make a getfield function that doesn't modify its input. It will be inefficient, but I don't think performance or efficiency are particularly important to your code. The getfield function would call strdup on its input, extract the string to return, call strdup on that, free the duplicate of the original input, and then return the pointer to the duplicate of the found field. The caller would have to free the returned pointer.
The issue is that strtok replaces the found delimiters with '\0'. You'll need to get a fresh copy of the line.
Or continue where you left off, using getfield (NULL, 2).
Related
I'm stumped in a rather trivial thing...
So, basically I want the "words" between the first one and the last one to go to data and the last one to go to key.
C-POSIX only, pls.
Is strtok_r the way to go or I'm way off on this? Something else?
char *key = NULL, *data=NULL, *save=NULL;
char comando[1024];
fgets(comando, 512, stdin);
strtok_r(comando, " ",&save);
while(strcmp(save,"\n")){
strcat(data,strtok_r(NULL," ",&save));
}
key = strtok_r(NULL, "\n",&save);
P.S: comando is 1024 as memory is not a problem and better safe than sorry. fgets reads 512 'cause that's the char line limit on standard unix terminal.
Your code will crash on this line:
strcat(data,strtok_r(NULL," ",&save));
Because you never reserved space for data. strcat will try to write to a NULL memory address.
Another thing to note is that you shouldn't rely on save to check for the end of the line. According to strtok's manpage:
The saveptr argument is a pointer to a char * variable that is used
internally by strtok_r() in order to maintain context between
successive calls that parse the same string.
Relying on the value of saveptr outside of strtok_r breaks the abstraction layer, you shouldn't assume anything about how strtok uses saveptr. It's bad practice.
A slightly better approach is to keep a pointer to the previous token returned by strtok, and a pointer to the current token. When strtok returns NULL, meaning there are no more tokens, then prev will point to the last token, which is your key. Here's some code:
char *key = NULL, *save=NULL;
char *prev, *curr;
char comando[1024];
char data[1024];
data[0] = '\0';
fgets(comando, 512, stdin);
prev = curr = strtok_r(comando, " ",&save);
while (curr != NULL) {
prev = curr;
curr = strtok_r(NULL, " ", &save);
if (curr != NULL)
strcat(data, prev);
}
key = prev;
Note that I allocated space for data by declaring it as array instead of pointer. The instruction
data[0] = '\0';
is there to make sure that strcat finds the null terminating byte in the first call.
You can replace the use of prev directly by key, I left it that way to make the code more readable.
A word of advice: always remember that strtok modifies its argument destructively (you lose the identity of the delimiting bytes), and that you can't call it with constant strings.
Note: data will contain every word concatenated. You lose the spaces. I'm not sure if this is what you want. If it's not, you might want to use something better than strcat (which is not very efficient, btw). For example, you code use sprintf to print the token into data with a leading space, and keep a pointer to the next free position in data.
I would suggest to replace your loop with a following code (printf() is used just for testing):
strtok_r(comando, " ", &save);
char *res = NULL;
while (NULL != (res = strtok_r(NULL, " ", &save))) {
if (key != NULL) {
//strcat(data, key); // FIXME
printf("data = %s\n", key);
}
key = res;
}
printf("key = %s\n", key);
Also strcat() should not be used with NULL arguments - it leads to a crash. So data pointer should be pointing to some array. Results of the running of the code:
┌─(16:08:22)─(michael#lorry)─(~/tmp/strtok)
└─► gcc -o main main.c; echo "one two three four five" | ./main
data=two
data=three
data=four
key = five
Lots wrong with your code
char *key = NULL, *data=NULL, *save=NULL;
Later on, you are using strcat to add strings to data but you have allocated no storage to data. That will cause a segmentation fault.
fgets(comando, 512, stdin);
fgets will read at most one less than the number passed to it. So, if the user does type in 512 characters, the string will have no terminating \n. Also, the only way to detect an error or end of file is to check the return result of fgets. If it's NULL either you have reached end of file (user has hit ctrl-d) or there is an error. In either case, the content of your buffer is indeterminate.
while(strcmp(save,"\n"))
I don't think you are allowed to rely on the assumption that your save pointer will point to the rest of the unconsumed string.
strtok_r(comando, " ",&save);
strtok_r signals that it has reached the end of the data by returning a NULL pointer. You can't throw away the return result without looking at it. Also, this will consume the trailing \n as part of the last token.
strcat(data,strtok_r(NULL," ",&save));
As I said before, data is a null pointer. Also, strtok_r can return NULL
I would do something more like:
char* currentTok = strtok_r(commando, " \n", &save); // separator is space or \n
char* previousTok = NULL;
while (currentTok != NULL)
{
if (previousTok != NULL)
{
// save previousTok in data unless its the first token
}
previousTok = currentTok;
currentTok = strtok_r(NULL, " \n", &save);
}
char* key = previousTok;
I am pretty confused with pointers in C. I am finding it hard to wrap my mind around creating them and passing stuff around? I have a "Segmentation Fault: 11" error after I added code, in which previously it worked. Needed to add something. This is part of the code:
char *token2;
char *line2;
char comma_loc = 0;
int num_of_commas = 0;
char *line2[1];
while(token != NULL) { //lets make sure token has a string token
//printf("Wats in token: %s\n", token);
if(key==true) {
//printf("This should be an identifier: %s\n", token);
if(comma != true) { //added if statement, just take away if it fails, the first case is the original
int len = strlen(token);
iden_holder[iden_holder_count] = (char *)malloc(sizeof(char) * (len +1));
memcpy(iden_holder[iden_holder_count], token, len +1);
iden_holder_count++;
key = false;
} else {
int len2 = strlen(token);
line2[0] = (char *)malloc(sizeof(char) * (len2 + 1));
memcpy(line2[0], token, len2 + 1);
token2 = strtok(line2[0],",");
while(token2 != NULL) {
int len = strlen(token2);
iden_holder[iden_holder_count] = (char *)malloc(sizeof(char) * (len +1));
memcpy(iden_holder[iden_holder_count], token, len +1);
iden_holder_count++;
token2 = strtok(line2[0],",");
}
key = false;
}
Point of this code is to take the string within token and copy it into another token, in my case token2. I decided to use memcpy, but I am confused how to use it due to the pointers confusion. I should also note that I used strtok before this, and the code here is within in. Could it be that if I use it again that it will override the other one?
Read this completely. It will help you with your basics. It did to me. :)
Pointers are exactly that: pointers. They're meant to point to something. The vast majority of problems people have with pointers is that they're not pointing anywhere intelligent :-)
Consider the following code:
char xyzzy[] = "hello";
char *pch;
In a stack-based C implementation, this will probably give you a stack containing the string and a pointer set to an arbitrary value.
The pointer exists on the stack like any other variable but it could point to anywhere.
If you then execute:
pch = xyzzy;
it's set to point to the first character of xyzzy (the h).
Arrays and pointers are very different beasts. For example, you cannot increment xyzzy to point to the second character of that string but you can increment pch.
The confusion arises because, in quite a lot of circumstances, arrays will decay to a pointer to the first element of that array.
That's basically the reason why you don't need [] for pointers, because they're not arrays. They do not know, and do not care, about how many things may exist at the memory they point at, their only concern is the one thing they currently point to.
Moving the pointer throughout the array, and ensuring you don't go off the ends, is extra management that you have to do as a programmer.
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.
I was having a problem with my function yesterday that turned out to be a simple fix, and now I am have a different issue with it that is baffling me. The function is a tokenizer:
void tokenizer(FILE *file, struct Set *set) {
int nbytes = 100;
int bytesread;
char *buffer;
char *token;
buffer = (char *) malloc(nbytes + 1);
while((bytesread = getLine(&buffer,&nbytes,file)) != -1) {
token = strtok(buffer," ");
while(token != NULL) {
add(set,token);
token = strtok(NULL," ");
}
}
}
I know that the input(a text file) is getting broken into tokens correctly because in the second while loop I can add printf("%s",token) to display each token. However, the problem is with add. It is only adding the first token to my list, but at the same time still breaks every token down correctly. For example, if my input text was "blah herp derp" I would get
token = blah
token = herp
token = derp
but the list would only contain the first token, blah. I don't believe that the problem is with add because it works on its own, ie I could use
add(set,"blah");
add(set,"herp");
add(set,"derp");
and the result will be a list that contains all three words.
Thank you for any help!
strtok() returns a pointer into the string buffer. You need to strdup() that string and add the result to your tree instead. Don't forget to free() it when you clean up the tree.
My application produces strings like the one below. I need to parse values between the separator into individual values.
2342|2sd45|dswer|2342||5523|||3654|Pswt
I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.
token = (char *)strtok(strAccInfo, "|");
for (iLoop=1;iLoop<=106;iLoop++) {
token = (char *)strtok(NULL, "|");
}
Any suggestions?
In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).
It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)
That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.
On a first call, the function expects
a C string as argument for str, whose
first character is used as the
starting location to scan for tokens.
In subsequent calls, the function
expects a null pointer and uses the
position right after the end of last
token as the new starting location for
scanning.
To determine the beginning and the end
of a token, the function first scans
from the starting location for the
first character not contained in
delimiters (which becomes the
beginning of the token). And then
scans starting from this beginning of
the token for the first character
contained in delimiters, which becomes
the end of the token.
What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.
Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.
char *mystrtok(char **m,char *s,char c)
{
char *p=s?s:*m;
if( !*p )
return 0;
*m=strchr(p,c);
if( *m )
*(*m)++=0;
else
*m=p+strlen(p);
return p;
}
reentrant
threadsafe
strictly ANSI conform
needs an unused help-pointer from calling
context
e.g.
char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
puts(t);
e.g.
char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
char *p1,*t1;
for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
puts(t1);
}
your work :)
implement char *c as parameter 3
Look into using strsep instead: strsep reference
Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:
// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) {
static char *current; // just as ugly as strtok!
char *pos, *ret;
if (input != NULL)
current = input;
if (current == NULL)
return current;
ret = current;
pos = strpbrk(current, delim);
if (pos == NULL)
current = NULL;
else {
*pos = '\0';
current = pos+1;
}
return ret;
}
Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '\0';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
you can use it like
char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
This suppose to output
1
2
3
5
I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it
Below is the solution that is working for me now. Thanks to all of you who responded.
I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.
char strAccInfo[1024], *p2;
int iLoop;
Action() { //This value would come from the wrsp call in the actual script.
lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");
//Store the parameter into a string - saves memory.
strcpy(strAccInfo,lr_eval_string("{test_Param}"));
//Get the first instance of the separator "|" in the string
p2 = (char *) strchr(strAccInfo,'|');
//Start a loop - Set the max loop value to more than max expected.
for (iLoop = 1;iLoop<200;iLoop++) {
//Save parameter names in sequence.
lr_param_sprintf("Param_Name","Parameter_%d",iLoop);
//Get the first instance of the separator "|" in the string (within the loop).
p2 = (char *) strchr(strAccInfo,'|');
//Save the value for the parameters in sequence.
lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));
//Save string after the first instance of p2, as strAccInfo - for looping.
strcpy(strAccInfo,p2+1);
//Start conditional loop for checking for last value in the string.
if (strchr(strAccInfo,'|')==NULL) {
lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
iLoop = 200;
}
}
}