I have created a program that requires reading a CSV file that contains bank accounts and transaction history. To access certain information, I have a function getfield which reads each line token by token:
const char* getfield(char* line, int num)
{
const char *tok;
for (tok = strtok(line, ",");
tok && *tok;
tok = strtok(NULL, ",\n"))
{
if (!--num)
return tok;
}
return NULL;
}
I use this later on in my code to access the account number (at position 2) and the transaction amount(position 4):
...
while (fgets(line, 1024, fp))
{
char* tmp = strdup(line);
//check if account number already exists
char *acc = (char*) getfield(tmp, 2);
char *txAmount = (char*)getfield(tmp, 4);
printf("%s\n", txAmount);
//int n =1;
if (acc!=NULL && atoi(acc)== accNum && txAmount !=NULL){
if(n<fileSize)
{
total[n]= (total[n-1]+atof(txAmount));
printf("%f", total[n]);
n++;
}
}
free(tmp1); free(tmp2);
}
...
No issue seems to arise with char *acc = (char*) getfield(tmp, 2), but when I use getfield for char *txAmount = (char*)getfield(tmp, 4) the print statement that follows shows me that I always have NULL. For context, the file currently reads as (first line is empty):
AC,1024,John Doe
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
TX,1024,2020-02-12,334.519989
I had previously asked if it was required to use free(acc) in a separate part of my code (Free() pointer error while casting from const char*) and the answer seemed to be no, but I'm hoping this question gives better context. Is this a problem with not freeing up txAmount? Any help is greatly appreciated !
(Also, if anyone has a better suggestion for the title, please let me know how I could have better worded it, I'm pretty new to stack overflow)
Your getfield function modifies its input. So when you call getfield on tmp again, you aren't calling it on the right string.
For convenience, you may want to make a getfield function that doesn't modify its input. It will be inefficient, but I don't think performance or efficiency are particularly important to your code. The getfield function would call strdup on its input, extract the string to return, call strdup on that, free the duplicate of the original input, and then return the pointer to the duplicate of the found field. The caller would have to free the returned pointer.
The issue is that strtok replaces the found delimiters with '\0'. You'll need to get a fresh copy of the line.
Or continue where you left off, using getfield (NULL, 2).
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.
i Have the following code
char inputs []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1";
char parameters[18];
strcpy(parameters,strtok(inputs,"/"));
and then some code to transmit my characters through uart and see them at a monitor. when i transmit the inputs i see them fine but when i transmit the parameters i see nothing the string is empty.
i have seen examples for strtok and it uses that code to split strings. I have also tried this kind of code at visual studio and when i print them it shows me the strings fine. Is there any chance that strtok doesn't function well with a microprocessor????
While working with microcontrollers, you have to take care from which memory area you are working on. On software running on a PC, everything is stored and run from the RAM. But, on a flash microcontroller, code is run from flash (also called program memory) while data are processed from RAM (also called data memory).
In the case you are working on, the inputs variable is storing an hardcoded character array, which can be const, and we don't know in which area the compiler chose to put it. So, we could rewrite you small program just to make sure that all the data are stored in program data and we will use the "_P" functions to manipulate this data.
#include <avr/pgmspace.h > // to play in program space
const char inputs PROGMEM []="3,0,23.30,3,30/55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/1.5,0.5,0.2,0.2,0.3,0.1"; // Now, we are sure this is in program memory space
char buffer[200]; // should be long enough to contain a copy of inputs
char parameters[18];
int length = strlen_P(inputs); // Should contains string length of 186 (just to debug)
strcpy_P(buffer,inputs); // Copy the PROGMEM data in data memory space
strcpy(parameters,strtok_P(buffer,"/")); // Parse the data now in data memory space
For more info on program space with avr gcc : http://www.nongnu.org/avr-libc/user-manual/pgmspace.html
I don't use strtok much, if at all, but this appears to be the correct way to store the result of strtok() in an char[]:
const int NUM_PARAMS = 18;
const int MAX_CHARS = 64;
char parameters[NUM_PARAMS][MAX_CHARS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
or this if you don't want to allocate unnecessary memory for smaller tokens:
const int NUM_PARAMS = 18;
char* parameters[NUM_PARAMS];
char delims[] = "/";
char *result = NULL;
int count = 0;
result = strtok(inputs, delims);
while(result != NULL && count < NUM_PARAMS){
parameters[count] = malloc(strlen(result) + 1);
strncpy(parameters[count++], result, MAX_CHARS);
result = strtok(NULL, delims);
}
parameters should now contain all of your tokens.
strtok() intentionally modifies the source string, replacing tokens with terminators as it goes. There is no reason to store the content in auxiliary storage whatsoever so long as the source string is mutable.
Splitting the string into conjoined constants (which will be assembled by the compiler, so they ultimately will be a single terminated string):
char inputs []="3,0,23.30,3,30/"
"55,55,55,55,55,55,55,55,55,55,55,55,55,64,64,64,100,100,100,100,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,55,55,70/"
"1.5,0.5,0.2,0.2,0.3,0.1";
Clearly the second and third sequences delimited by '/' are nowhere near 17 chars wide (remember, your copy needs one more place for the terminator, thus only 17 chars can legally be copied).
Unless there is a compelling reason to do otherwise, I see no reason you can't simply do this:
char *param = strtok(inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
What you do with // TODO do something with param... is up to you. The length of the parameter can be retrieved by strlen(param), for example. You can make a copy, providing you have enough storage space as the destination (in the second and third cases, you don't provide enough storage with only an 18-char buffer in your example).
Regardless, remember, strtok() modifies the source inputs[] array. If that is not acceptable an alternative would be something like:
char *tmp_inputs = strdup(inputs);
if (tmp_inputs != NULL)
{
char *param = strtok(tmp_inputs, "/");
while (param != NULL)
{
// TODO: do something with param...
// ... then advance to next param
param = strtok(NULL, "/");
}
// done with copy of inputs. free it.
free(tmp_inputs);
}
There are considerably threading decisions to make as well, the very reason the version of strtok() that requires the caller (you) to tote around context between calls was invented. See strtok_r() for more information.
You have to initialialize parameters that way :
char parameters[18] = {'\0'};
Thus, it will be initialized with null characters instead of variable values. This is important, because strcpy will try to find a null character to identify the end of the string.
Just to make it clear...
parameters after its initialisation : [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
parameters after strtok : ['3',',','0',',','2','3','.','3','0',',','3',',','3','0',0,0,0,0]
I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.