CSV Parsing - Second Field Returning Null Values - c

I use the following code to parse this CSV
me;val1;val2;val3;val4;val5;
me;val1;val2;val3;val4;val5;
void readcsv()
{
FILE* stream = fopen("input.csv", "r");
char line[1024];
while (fgets(line, 1024, stream))
{
char* tmp = strdup(line);
// printf("Field 1 would be %s\n", getcsvfield(tmp, 1));
printf("Field 1 would be %s\n", getcsvfield(tmp, 1));
printf("Field 2 would be %s\n", getcsvfield(tmp, 2));
// NOTE strtok clobbers tmp
free(tmp);
}
}
//Used for parsing CSV
const char* getcsvfield(char* line, int num)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--num)
return tok;
}
return NULL;
}
But i keep getting NULL value in the second Field
Output:
Field 1 would be me
Field 2 would be (null)
Field 1 would be me
Field 2 would be (null)
What im i doing wrong?

strtok(line, ";");
strtok modifies the string (in this case line) being passed to it. So you should not use the same line (returned from first call to getcsvfield) also during the second time you call getcsvfield, because after first call to getcsvfield, line now has different content.
Notice this is not an issue within the function getcsvfield, because when you pass NULL second time to strtok inside that function, it knows how to proceed with the modified input string in a correct way.
From manual about strtok parameters:
Notice that this string is modified by being broken into smaller
strings (tokens).
Something like this should do the trick. This is the most "basic" approach, you can try other ones too. Leave the getcsvfield function as you had in your code initially, just on the caller side do:
char line[1024];
char buffer[1024];
while (fgets(line, 1024, stream))
{
// char* tmp = strdup(line); not necessary in this case
strcpy(buffer, line);
printf("Field 1 would be %s\n", getcsvfield(buffer, 1));
strcpy(buffer, line);
printf("Field 2 would be %s\n", getcsvfield(buffer, 2));
// free(tmp);
}
As it currently stands above, each call to getcsvfield, returns poitner to same memory address - buffer. For printing it works ok (because at the time of printing it shows what is there in buffer), but if you want to store result of each call to getcsvfield for later use, you may want to copy result of each invocation of getcsvfield to some different memory location each time.

Related

Seg Fault when reading simple CSV file - C

I am reading a 2 columned csv file into an array of structs:
struct unused_s{
char col1[MAX_ARG_LENGTH];
char col2[MAX_ARG_LENGTH];
};
struct unused_s unused[MAX_USEABLE];
But I am getting a "Segmentation fault: 11" during execution. I have tried my best to debug this myself through reallocation of memory but I'm afraid my abilities are not up to the task. I have, however, pinpointed that the error is occuring somewhere in this section of code:
void readCSV(FILE *file){
int i = 0;
char line[MAX_LINE_LENGTH];
while (fgets(line, 1024, file))
{
char* tmp = strdup(line);
strcpy(unused[i].col1, getunused(tmp, FIRST_COLUMN));
strcpy(unused[i].col2, getunused(tmp, SECOND_COLUMN));
free(tmp);
i++;
}
fclose(file);
}
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
Any help solving this/pointing me in the right direction to solve this myself would be greatly appreciated!
As noted in the comments by John3136, you are returning NULL from getunused(), e.g.
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
From your calls to strtok, it appears you have an input file that will result in tmp similar to:
tmp = "somevalue; othervalue\n"
After your 1st call to getunused(), strtok will have replaced each delimiter in tmp with a nul-character in order to tokenize the string, so tmp will now contain:
tmp = "somevalue\0 othervalue\0"
When you call getunused(tmp, SECOND_COLUMN) (where SECOND_COLUMN is presumably 2), !--n tests false and NULL is returned.
Why Tokenize?
Rarely will you need to tokenize fields from a .csv file (or in your case a semi-colon separated file) Why? That is the whole purpose of a separated values file -- so you can read the file as input using a formatted input function to separate the fields rather than tokenizing on delimiters. (which you can do -- it's just not generally necessary). In your case, if your .csv file format is as set out above, then you can eliminate getunused entirely and simply use sscanf to separate the input strings, e.g.
void readCSV (FILE *file) {
int i = 0;
while (fgets(line, 1024, file))
if (sscanf (line "%49[^;] %49[^;\n]", unused[i].col1, unused[i].col2) == 2)
i++;
fclose(file);
}
(note: as in my comment, you should include the field-width modifier of MAX_ARG_LENGTH-1 (the number) as part of your format-specifier -- as edited above after your last comment)
Also, if your second value is terminated by a '\n', then drop the ';' from the character class, e.g. %49[^\n] will do for the 2nd value.

strtok changing value of pointer

i have the following code:
char* pathTokens;
char* paths;
paths = getFilePaths();
//printf("%s", paths);
pathTokens = strtok(paths, "\n");
updateFile(pathTokens, argv[1]);
and these variables in the same file as updateFile():
static FILE* file;
static char content[1024];
static char* token;
static int numChanges = 0;
static char newContent[1024];
Here is updateFile():
void updateFile(char pathTokens[], char searchWord[]) {
while(pathTokens != NULL) {
printf("Token: %s\n", pathTokens);
updateNewContent(pathTokens, searchWord);
pathTokens = strtok(NULL, "\n");
}
}
and updateNewContent():
static void updateNewContent(char fileName[], char searchWord[]) {
if(searchWord == NULL) {
printf("Please enter a word\n");
return;
}
numChanges = 0;
file = fopen(fileName, "r");
if(file == NULL) {
printf("Error opening file\n");
return;
}
while(fgets(content, 1024, file) != NULL) {
token = strtok(content, " ");
}
fclose(file);
}
whenever token = strtok(content, " "); is called, the value of pathTokens changes. if i comment it out, pathTokens maintains its original values. i don't want pathTokens to change, so why is strtok modifying it?
You are nesting strtok calls and strtok doesn't work like that. For nesting
calls you have to use strtok_r.
Also, when calling strtok, only the first time the source argument must be
used, for all subsequent calls, NULL has to be used. When you call strtok
again with an non-NULL argument, strtok "forgets" about the last state and
"restarts" parsing new content.
When you do in updateNewContent you are doing:
while(fgets(content, 1024, file) != NULL) {
token = strtok(content, " ");
}
strtok will forget about paths (the very first call). Also this loop is
pointless, you read a line, you split it for the first time, and then read the
next line, split it again, etc. You are doing nothing with token. When the
loop ends token will store the first word of the last line.
And then the function returns and you do
pathTokens = strtok(NULL, "\n");
Because you call it with NULL, it will look continue parsing the contents
pointed to by content, which seems to be a global variable.
whenever token = strtok(content, " "); is called, the value of pathTokens changes
Of course it does, after updateNewContent returns, you assign a new value to
it. What else did you expect?
I really don't know what you are trying to do here, to me that makes no sense.
If you need to do a strtok with a token that previously returned by another
strtok, then you have to use strtok_r.
Here is an example of how to nest strtok:
char line[] = "a:b:c,d:e:f,x:y:z";
char *s1, *s2, *token1, *token2, *in1, *in2;
in1 = line;
while(token1 = strtok_r(in1, ",", &s1))
{
in1 = NULL; // for subsequent calls
in2 = token1;
printf("First block: %s\n", token1);
while(token2 = strtok_r(in2, ":", &s2))
{
in2 = NULL; // for subsequent calls
printf(" val: %s\n", token2);
}
}
Output:
First block: a:b:c
val: a
val: b
val: c
First block: d:e:f
val: d
val: e
val: f
First block: x:y:z
val: x
val: y
val: z
If you use strtok() function it means that you want to divide your input into tokens. Like that when you given input strtok(pathtokens,"") ,divides into tokens and prints even though there is pointer variable

Segmentation fault (core dumped) c

Here is a weird problem:
token = strtok(NULL, s);
printf(" %s\n", token); // these two lines can read the token and print
However!
token = strtok(NULL, s);
printf("%s\n", token); // these two lines give me a segmentation fault
Idk whats happened, because I just add a space before %s\n, and I can see the value of token.
my code:
int main() {
FILE *bi;
struct _record buffer;
const char s[2] = ",";
char str[1000];
const char *token;
bi = fopen(DATABASENAME, "wb+");
/*get strings from input, and devides it into seperate struct*/
while(fgets(str, sizeof(str), stdin)!= NULL) {
printf("%s\n", str); // can print string line by line
token = strtok(str, s);
strcpy(buffer.id, token);
printf("%s\n", buffer.id); //can print the value in the struct
while(token != NULL){
token = strtok(NULL, s);
printf("%s\n", token); // problem starts here
/*strcpy(buffer.lname, token);
printf("%s\n", buffer.lname); // cant do anything with token */
}}
fclose(bi);
return 1;}
Here is the example of string I read from stdin and after parsed(I just tried to strtok the first two elements to see if it works):
<15322101,MOZNETT,JOSE,n/a,n/a,2/23/1943,MALE,824-75-8088,42 SMITH AVENUE,n/a,11706,n/a,n/a,BAYSHORE,NY,518-215-5848,n/a,n/a,n/a
<
< 15322101
< MOZNETT
In the first version your compiler transforms printf() into a
puts() and puts does not allow null pointers, because internally
invokes the strlen() to determine the lenght of the string.
In the case of the second version you add a space in front of format
specifier. This makes it impossible for the compiler to call puts
without appending this two string together. So it invokes the actual
printf() function, which can handle NULL pointers. And your code
works.
Your problem reduces to the following question What is the behavior of printing NULL with printf's %s specifier?
.
In short NULL as an argument to a printf("%s") is undefined. So you need to check for NULL as suggested by #kninnug
You need to change you printf as follows:
token = strtok(NULL, s);
if (token != NULL) printf("%s\n", token);
Or else
printf ("%s\n", token == NULL ? "" : token);

Get each word from a line in a text file

I am trying to read a txt file, and I can get the line which I want, but I can not print every words in this line one by one;
for example: the line looks like:
hello world 1 2 3
and I need print them one by one which looks like:
hello
world
1
2
3
I got the segmentation fault core dumped error
char temp[256];
while(fgets(temp, 256, fp) != NULL) {
...
int tempLength = strlen(temp);
char *tempCopy = (char*) calloc(tempLength + 1, sizeof(char));
strncpy(temCopy, temp, tempLength); // segmentation fault core dumped here;
// works fine with temp as "name country"
name = strtok_r(tempCopy, delimiter, &context);
country = strtok_r(Null, delimiter, &context);
printf("%s\n", name);
printf("%s\n", country);
}
Can anyone help me fix the code?
Thanks!
Impleted with strtok()
char *p;
char temp[256];
while(fgets(temp,256,fp) != NULL){
p = strtok (temp," ");
while (p != NULL)
{
printf ("%s\n",p);
p = strtok (NULL, " ");
}
}
If you see man strtok You will found
BUGS
Be cautious when using these functions. If you do use them, note that:
* These functions modify their first argument.
* These functions cannot be used on constant strings.
* The identity of the delimiting character is lost.
* The strtok() function uses a static buffer while parsing, so it's not thread safe. Use strtok_r() if this matters to you.
Try to make changes with strtok_r()
While read a line from a file you can invoke the following function:
if( fgets (str, 60, fp)!=NULL ) {
puts(str);
token = strtok(str," ");
while(token != NULL)
{
printf("%s\n",token);
token = strtok(NULL," ");
}
}

Strtok and Strcat conflict

I am trying to work with strtok and strcat but the second printf never shows up. Here is the code:
int i = 0;
char *token[128];
token[i] = strtok(tmp, "/");
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(token[0], token[i]);
printf("%s", token[i]);
i++;
}
If my input is 1/2/3/4/5/6 for tmp then the console output would be 13456. The 2 is always missing. Does anyone know how to fix this?
The two is always missing because on the first iteration of your loop you overwrite it with the call to strcat.
After entry to the loop your buffer contains: "1\02\03/4/5/6" internal strtok pointer is pointing to "3". tokens[1] points to "2".
You then call strcat: "12\0\03/4/5/6" so your token[i] pointer is pointing to "\0". The first print prints nothing.
Subsequent calls are OK because the null characters do not overwrite the input data.
To fix it you should build up your output string into a second buffer, not the one you are parsing.
A working(?) version:
#include <stdio.h>
#include <string.h>
int main(void)
{
int i = 0;
char *token[128];
char tmp[128];
char removed[128] = {0};
strcpy(tmp, "1/2/3/4/5/6");
token[i] = strtok(tmp, "/");
strcat(removed, token[i]);
printf("%s\n", token[i]);
i++;
while ((token[i] = strtok(NULL, "/")) != NULL) {
strcat(removed, token[i]);
printf("%s", token[i]);
i++;
}
return (0);
}
strtok modifies the input string in place and returns pointers to that string. You then take one of those pointers (token[0]) and pass it to another operation (strcat) that writes to that pointer. The writes are clobbering each other.
If you want to concatenate all the tokens, you should allocate a separate char* to strcpy to.

Resources