I'm trying to read CSV from a text file in C. The text file format is
1,Bob,bob#gmail.com
2,Daniel,daniel#gmail.com
3,John,john#gmail.com
When I run the program, the number displays fine but the name and email are being displayed as garbage. Here is my program...
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int number;
char* name;
char* email;
} Owner;
Owner owners[100];
int load(char* filename)
{
char buffer[200];
char token[50];
Owner* owner;
int owners_size = 0;
FILE* file = fopen(filename, "r");
while(fgets(buffer, 200, file) != NULL)
{
owner = (Owner*)malloc(sizeof(Owner));
owner->number = atoi(strtok(buffer, ","));
owner->name = strtok(NULL, ",");
owner->email = strtok(NULL, ",");
owners[owners_size++] = *owner;
}
fclose(file);
return owners_size;
}
int main()
{
int choise, owners_size, index;
char* owners_filename = "owners2.txt";
owners_size = load(owners_filename);
if(owners_size)
{
printf("owners size: %d\n\n", owners_size);
for(index = 0; index < owners_size; index++)
printf("%d, %s %s\n", owners[index].number, owners[index].name, owners[index].email);
}
}
Can anyone tell me what the reason is. I appreciate your help.
Two problems:
You didn't allocate space for the strings in the structure:
typedef struct
{
int number;
char *name;
char *email;
} Owner;
You need to provide space for those pointers to point at to hold the names.
You keep on supplying pointers to the buffer which is reused for each line of input:
while(fgets(buffer, 200, file) != NULL)
{
owner = (Owner*)malloc(sizeof(Owner));
owner->number = atoi(strtok(buffer, ","));
owner->name = strtok(NULL, ",");
owner->email = strtok(NULL, ",");
owners[owners_size++] = *owner;
}
The first line gets stored as some pointers into the buffer. The next line then overwrites the buffer and chops the line up again, trampling all over the original input.
Consider using strdup():
while (fgets(buffer, 200, file) != NULL)
{
owner = (Owner *)malloc(sizeof(Owner));
owner->number = atoi(strtok(buffer, ","));
owner->name = strdup(strtok(NULL, ","));
owner->email = strdup(strtok(NULL, ","));
owners[owners_size++] = *owner;
}
This is slightly dangerous code (I'd not use it in production code) because it doesn't check that strtok() found a token when expected (or that strdup() was successful). There again, I wouldn't use strtok() in production code either; I'd use POSIX strtok_r() or Microsoft's strtok_s() if they were available, or some alternative technique, probably using strspn() and strcspn(). If strdup() is not available, you can write your own, with the same or a different name:
char *strdup(const char *str)
{
size_t len = strlen(str) + 1;
char *dup = malloc(len);
if (dup != 0)
memmove(dup, str, len); // Or memcpy() - that is safe in this context
return(dup);
}
You might note that your code is only suitable for simple CSV files. If you encountered a line like this (which is legitimate CSV), you'd have problems (with quotes in your values, and mis-splitting because of the comma inside the quoted string):
1,"Bob ""The King"" King","Bob King, Itinerant Programmer <bob#gmail.com>"
The pointer returned by strtok() points to an address within the buffer it is parsing, in this case the local variable buffer. When load() returns the variable it is out of scope (even if it wasn't all instances of owners would be pointing the same address). You need to copy the string returned by strtok(). You could use strdup() if available or use malloc() and strcpy().
There is no need to malloc() new instances of Owner as an array of them already exist (the code as is stands has a memory leak).
Note there is no protection against going beyond the bounds of the owners array. If the file has more than 100 entries then the loop will go beyond the bounds of the array. Extend the terminating condition of the while to prevent this:
while(owners_size < sizeof(owners) / sizeof(owners[0]) &&
fgets(buffer, 200, file) != NULL)
{
}
You just stored pointers into a local buffer. When you leave load() this buffer is gone and not accessible anymore.
You must allocate memory for name and email before you can copy it into the Owner struct.
char *tok;
tok = strtok(NULL, ",");
len = strlen(tok);
owner->name = malloc(len + 1);
strcpy(owner->name, tok);
...
[EDIT: you need to allocate len+1 bytes so you have space for the terminating NUL character. -Zack]
You've only got one line buffer. Every cycle of the loop in load clobbers the text from the previous cycle. And if that wasn't bad enough, the buffer is destroyed when load returns.
The quick fix is to change
owner->name = strtok(NULL, ",");
owner->email = strtok(NULL, ",");
to
owner->name = strdup(strtok(NULL, ","));
owner->email = strdup(strtok(NULL, ","));
(If you don't have strdup, get a real computer it's very simple to write.)
If I were reviewing your code, though, I would ding you for the fixed-size line buffer, the fixed-size owners array, the memory leak, using atoi instead of strtol, using strtok instead of strsep, and the absence of quote handling and parse error recovery, and point out that it would be more efficient to allocate each line as a unit and then save pointers into it.
Related
I am reading a 2 columned csv file into an array of structs:
struct unused_s{
char col1[MAX_ARG_LENGTH];
char col2[MAX_ARG_LENGTH];
};
struct unused_s unused[MAX_USEABLE];
But I am getting a "Segmentation fault: 11" during execution. I have tried my best to debug this myself through reallocation of memory but I'm afraid my abilities are not up to the task. I have, however, pinpointed that the error is occuring somewhere in this section of code:
void readCSV(FILE *file){
int i = 0;
char line[MAX_LINE_LENGTH];
while (fgets(line, 1024, file))
{
char* tmp = strdup(line);
strcpy(unused[i].col1, getunused(tmp, FIRST_COLUMN));
strcpy(unused[i].col2, getunused(tmp, SECOND_COLUMN));
free(tmp);
i++;
}
fclose(file);
}
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
Any help solving this/pointing me in the right direction to solve this myself would be greatly appreciated!
As noted in the comments by John3136, you are returning NULL from getunused(), e.g.
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
From your calls to strtok, it appears you have an input file that will result in tmp similar to:
tmp = "somevalue; othervalue\n"
After your 1st call to getunused(), strtok will have replaced each delimiter in tmp with a nul-character in order to tokenize the string, so tmp will now contain:
tmp = "somevalue\0 othervalue\0"
When you call getunused(tmp, SECOND_COLUMN) (where SECOND_COLUMN is presumably 2), !--n tests false and NULL is returned.
Why Tokenize?
Rarely will you need to tokenize fields from a .csv file (or in your case a semi-colon separated file) Why? That is the whole purpose of a separated values file -- so you can read the file as input using a formatted input function to separate the fields rather than tokenizing on delimiters. (which you can do -- it's just not generally necessary). In your case, if your .csv file format is as set out above, then you can eliminate getunused entirely and simply use sscanf to separate the input strings, e.g.
void readCSV (FILE *file) {
int i = 0;
while (fgets(line, 1024, file))
if (sscanf (line "%49[^;] %49[^;\n]", unused[i].col1, unused[i].col2) == 2)
i++;
fclose(file);
}
(note: as in my comment, you should include the field-width modifier of MAX_ARG_LENGTH-1 (the number) as part of your format-specifier -- as edited above after your last comment)
Also, if your second value is terminated by a '\n', then drop the ';' from the character class, e.g. %49[^\n] will do for the 2nd value.
I am trying to read a file line by line and split it into words. Those words should be saved into an array. However, the program only gets the first line of the text file and when it tries to read the new line, the program crashes.
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {
token = strtok(NULL, " ");
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
After good answer from xing I decided to write my FULL simple program realizing your task and tell something about my solution. My program reads line-by-line a file, given as input argument and saves next lines into a buffer.
Code:
#include <assert.h>
#include <errno.h>
#define _WITH_GETLINE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define assert_msg(x) for ( ; !(x) ; assert(x) )
int
main(int argc, char **argv)
{
FILE *file;
char *buf, *token;
size_t length, read, size;
assert(argc == 2);
file = fopen(argv[1], "r");
assert_msg(file != NULL) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
token = NULL;
length = read = size = 0;
while ((read = getline(&token, &length, file)) != -1) {
token[read - 1] = ' ';
size += read;
buf = realloc(buf, size);
assert(buf != NULL);
(void)strncat(buf, token, read);
}
printf("%s\n", buf);
fclose(file);
free(buf);
free(token);
return (EXIT_SUCCESS);
}
For file file.txt:
that is a
text
which I
would like to
read
from file.
I got a result:
$ ./program file.txt
that is a text which I would like to read from file.
Few things which is worth to say about that solution:
Instead of fgets(3) I used getline(3) function because of easy way to knowledge about string length in line (read variable) and auto memory allocation for got string (token). It is important to remember to free(3) it. For Unix-like systems getline(3) is not provided by default in order to avoid compatibility problems. Therefore, #define _WITH_GETLINE macro is used before <stdio.h> header to make that function available.
buf contains only mandatory amount of space needed to save string. After reading one line from file buf is extended by the required amount of space by realloc(3). Is it a bit more "universal" solution. It is important to remember about freeing objects allocated on heap.
I also used strncat(3) which ensures that no more than read characters (length of token) would be save into buf. It is also not the best way of using strncat(3) because we also should testing a string truncation. But in general it is better than simple using of strcat(3) which is not recommended to use because enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack. strcat(3) and strncat(3) also adds terminating \0.
A getline(3) returns token with a new line character so I decided to replace it from new line to space (in context of creating sentences from words given in file). I also should eliminate last space but I do not wanted to complicate a source code.
From not mandatory things I also defined my own macro assert_msg(x) which is able to run assert(3) function and shows a text message with error. But it is only a feature but thanks to that we are able to see error message got during wrong attempts open a file.
The problem is getting the next token in the inner while loop and passing the result to strcpy without any check for a NULL result.
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {//not at the end of the line. yet!
token = strtok(NULL, " ");//get next token. but token == NULL at end of line
//passing NULL to strcpy is a problem
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
By incorporating the check into the while condition, passing NULL as the second argument to strcpy is avoided.
while ( ( token = strtok ( NULL, " ")) != NULL) {//get next token != NULL
//if token == NULL the while block is not executed
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
Sanitize your loops, and don't repeat yourself:
#include <stdio.h>
#include <string.h>
int main(void)
{
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
for(i=0; fgets(buf,sizeof(buf),inputfile); ) {
for(token = strtok(buf, " "); token != NULL; token = strtok(NULL, " ")){
strcpy(words[i++], token);
}
}
return 0;
}
I have lineget function that returns char *(it detects '\n') and NULL on EOF.
In main() I'm trying to recognize particular words from that line.
I used strtok:
int main(int argc, char **argv)
{
char *line, *ptr;
FILE *infile;
FILE *outfile;
char **helper = NULL;
int strtoks = 0;
void *temp;
infile=fopen(argv[1],"r");
outfile=fopen(argv[2],"w");
while(((line=readline(infile))!=NULL))
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if(temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
} else {
helper=temp;
}
while (ptr != NULL) {
strtoks++;
fputs(ptr, outfile);
fputc(' ', outfile);
ptr = strtok(NULL, " ");
helper[strtoks-1] = ptr;
}
/*fputs(line, outfile);*/
free(line);
}
fclose(infile);
fclose(outfile);
return 0;
}
Now I have no idea how to put every of tokenized words into an array (I created char ** helper for that purpose), so that it can be used in qsort like qsort(helper, strtoks, sizeof(char*), compare_string);.
Ad. 2 Even if it would work - I don't know how to clear that line, and proceed to sorting next one. How to do that?
I even crashed valgrind (with the code presented above) -> "valgrind: the 'impossible' happened:
Killed by fatal signal"
Where is the mistake ?
The most obvious problem (there may be others) is that you're reallocating helper to the value of strtoks at the beginning of the line, but then incrementing strtoks and adding to the array at higher values of strtoks. For instance, on the first line, strtoks is 0, so temp = realloc(helper, (strtoks)*sizeof(char *)); leaves helper as NULL, but then you try to add every word on that line to the helper array.
I'd suggest an entirely different approach which is conceptually simpler:
char buf[1000]; // or big enough to be bigger than any word you'll encounter
char ** helper;
int i, numwords;
while(!feof(infile)) { // most general way of testing if EOF is reached, since EOF
// is just a macro and may not be machine-independent.
for(i = 0; (ch = fgetc(infile)) != ' ' && ch != '\n'; i++) {
// get chars one at a time until we hit a space or a newline
buf[i] = ch; // add char to buffer
}
buf[i + 1] = '\0' // terminate with null byte
helper = realloc(++numwords * sizeof(char *)); // expand helper to fit one more word
helper[numwords - 1] = strdup(buffer) // copy current contents of buffer to the just-created element of helper
}
I haven't tested this so let me know if it's not correct or there's anything you don't understand. I've left out the opening and closing of files and the freeing at the end (remember you have to free every element of helper before you free helper itself).
As you can see in strtok's prototype:
char * strtok ( char * str, const char * delimiters );
...str is not const. What strtok actually does is replace found delimiters by null bytes (\0) into your str and return a pointer to the beginning of the token.
Per example:
char in[] = "foo bar baz";
char *toks[3];
toks[0] = strtok(in, " ");
toks[1] = strtok(NULL, " ");
toks[2] = strtok(NULL, " ");
printf("%p %s\n%p %s\n%p %s\n", toks[0], toks[0], toks[1], toks[1],
toks[2], toks[2]);
printf("%p %s\n%p %s\n%p %s\n", &in[0], &in[0], &in[4], &in[4],
&in[8], &in[8]);
Now look at the results:
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
0x7fffd537e870 foo
0x7fffd537e874 bar
0x7fffd537e878 baz
As you can see, toks[1] and &in[4] point to the same location: the original str has been modified, and in reality all tokens in toks point to somewhere in str.
In your case your problem is that you free line:
free(line);
...invalidating all your pointers in helper. If you (or qsort) try to access helper[0] after freeing line, you end up accessing freed memory.
You should copy the tokens instead, e.g.:
ptr = strtok(NULL, " ");
helper[strtoks-1] = malloc(strlen(ptr) + 1);
strcpy(helper[strtoks-1], ptr);
Obviously, you will need to free each element of helper afterwards (in addition to helper itself).
You should be getting a 'Bad alloc' error because:
char **helper = NULL;
int strtoks = 0;
...
while ((line = readline(infile)) != NULL) /* Fewer, but sufficient, parentheses */
{
ptr = strtok(line, " ");
temp = realloc(helper, (strtoks)*sizeof(char *));
if (temp == NULL) {
printf("Bad alloc error\n");
free(helper);
return 0;
}
This is because the value of strtoks is zero, so you are asking realloc() to free the memory pointed at by helper (which was itself a null pointer). One outside chance is that your library crashes on realloc(0, 0), which it shouldn't but it is a curious edge case that might have been overlooked. The other possibility is that realloc(0, 0) returns a non-null pointer to 0 bytes of data which you are not allowed to dereference. When your code dereferences it, it crashes. Both returning NULL and returning non-NULL are allowed by the C standard; don't write code that crashes regardless of which behaviour realloc() shows. (If your implementation of realloc() does not return a non-NULL pointer for realloc(0, 0), then I'm suspicious that you aren't showing us exactly the code that managed to crash valgrind (which is a fair achievement — congratulations) because you aren't seeing the program terminate under control as it should if realloc(0, 0) returns NULL.)
You should be able to avoid that problem if you use:
temp = realloc(helper, (strtoks+1) * sizeof(char *));
Don't forget to increment strtoks itself at some point.
I am writing a program which reads and parses a CSV file, populates a struct with the contents of CSV file and then writes the struct to a file in binary mode. I am parsing csv fileby tokenizing it and writing each token to struct.
The problem is that when I try to write this struct to data file, the file contents show some special character i.e it writes any random value. I am attaching my output.dat file.
Can anyone pls help me in finding uot where I am wrong? Thanks.
Here is my code:
typedef struct
{
int AccountNumber;
char *AccountName;
double AccountBalance;
double LastPaymentAmount;
char *LastPaymentDate;
} Person;
FILE *fpData;
Person temp = {0,"",0,0,0};
if ( ( fpData = fopen( "input.csv", "r" ) ) == NULL ) //Reading a file
{
printf( "File could not be opened.\n" );
}
while(fgets(buf, BUFFER_SIZE, fpData) != NULL)
{
/* Here we tokenize our string and scan for " \n" characters */
// for(tok = strtok(buf,"\n");tok;tok=strtok(NULL,"\n"))
// {
tok = strtok(buf, ",");
temp.AccountNumber = atoi(tok);
printf(" %i ",temp.AccountNumber );
tok = strtok(NULL, ",");
temp.AccountName = tok;
printf("%s ",temp.AccountName );
tok = strtok(NULL, ",");
temp.AccountBalance = atof(tok);
printf("temp.AccountBalance = %f ",temp.AccountBalance );
tok = strtok(NULL, ",");
temp.LastPaymentAmount = atof(tok);
printf("temp.LastPaymentAmount = %f ",temp.LastPaymentAmount );
tok = strtok(NULL, ",");
temp.LastPaymentDate = tok;
printf("temp.LastPaymentDate = %s ",temp.LastPaymentDate );
tok = strtok(NULL, ",");
printf("\n");
// }
}
if ( ( fpData = fopen( "output.dat", "wb" ) ) == NULL )
{
printf( "File could not be opened.\n" );
}
else
{
printf("\nFileName is:%s\n",argv[2]);
printf("File will be overwritten. Do you want to continue?\nPress Y if yes, N if no");
printf("\n?\n");
scanf("%c", &choice);
if(choice=='Y')
{
for(i=0;i<10;i++)
{
fwrite(&temp, sizeof(temp), 10, fpData);
}
}
}
fclose(fpData);
You're not allocating any memory for the array of structures you're trying to copy into your output file, as well as the character array members in your structure. Simply copying the pointer returned from strtok() will not work as that is pointing to a static character array inside the strtok() function. So basically after a single pass through your while-loop, both temp.AccountName and temp.LastPaymentDate are pointing to the same exact memory location. Furthermore as chemuduguntar pointed out above, when you write-out the structure, you're only writing out the memory pointers ... there is no actual string data in your structure for what you're assuming are character arrays.
You have two choices ... either declare your structure with static storage to store the string arrays and then use strcpy() to copy the data from strtok() into those arrays, or use malloc() and allocate memory for your pointers (just remember you'll have to free those pointers later unless you want memory leaks).
So for instance, you could do something like this:
#define MAXBUFSIZE 511
typedef struct
{
int AccountNumber;
char AccountName[MAXBUFSIZE + 1];
double AccountBalance;
double LastPaymentAmount;
char LastPaymentDate[MAXBUFSIZE + 1];
} Person;
Then inside your while-loop, when you call strtok() you can do this:
tok = strtok(NULL, ",");
strncopy(temp.AccountName, tok, MAXBUFSIZE);
temp.AccountName[MAXBUFSIZE] = '\0'; //safety NULL termination
//...more code
tok = strtok(NULL, ",");
strncopy(temp.LastPaymentDate, tok, MAXBUFSIZE);
temp.LastPaymentDate[MAXBUFSIZE] = '\0'; //safety NULL termination
Now with this approach there is actual data inside your structures that is not pointing to some temporary storage somewhere ... the only downside to this approach is that if you go over 512 bytes, then you'll clip that data. If you go under 512 bytes, then you will have all zeros padding out the end of the character arrays.
Next, somewhere you need to declare either:
Person myarray[10];
or
Person* myarray = calloc(10, sizeof(Person));
because right now every time you go through your while-loop, you're over-writing the previous value of temp. So at some point you need to copy your temp structure into more permanent storage array. For instance, at the end of your while-loop you should call:
memcpy(&myarray[LOOPNUMBER], &temp, sizeof(Person));
Finally, for the call to fwrite() I would change that slightly so it's something like:
for(i=0;i<10;i++)
{
fwrite(myarray, sizeof(Person), 10, fpData);
}
And again, if you use pointers with malloc(), calloc(), etc., be sure to free that storage afterwards by a call to free().
Hope this helps,
Jason
I don't think this will quite work without defining a static array in your structure for storing strings, currently when you are writing the structure to disk - only the pointers are being written.
I'm using strtok() in c to parse a csv string. First I tokenize it to just find out how many tokens there are so I can allocate a string of the correct size. Then I go through using the same variable I used last time for tokenization. Every time I do it a second time though it strtok(NULL, ",") returns NULL even though there are still more tokens to parse. Can somebody tell me what I'm doing wrong?
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
//allocate array
tok = strtok(buffer, ",");
while(tok != NULL) {
//do other stuff
tok = strtok(NULL, ",");
}
So on that second while loop it always ends after the first token is found even though there are more tokens. Does anybody know what I'm doing wrong?
strtok() modifies the string it operates on, replacing delimiter characters with nulls. So if you want to use it more than once, you'll have to make a copy.
There's not necessarily a need to make a copy - strtok() does modify the string it's tokenizing, but in most cases that simply means the string is already tokenized if you want to deal with the tokens again.
Here's your program modified a bit to process the tokens after your first pass:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int i;
char buffer[] = "some, string with , tokens";
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
// walk through the tokenized buffer again
tok = buffer;
for (i = 0; i < count; ++i) {
printf( "token %d: \"%s\"\n", i+1, tok);
tok += strlen(tok) + 1; // get the next token by skipping past the '\0'
tok += strspn(tok, ","); // then skipping any starting delimiters
}
return 0;
}
Note that this is unfortunately trickier than I first posted - the call to strspn() needs to be performed after skipping the '\0' placed by strtok() since strtok() will skip any leading delimiter characters for the token it returns (without replacing the delimiter character in the source).
Use strsep - it actually updates your pointer. In your case you would have to keep calling NULL versus passing in the address of your string. The only issue with strsep is if it was previously allocated on the heap, keep a pointer to the beginning and then free it later.
char *strsep(char **string, char *delim);
char *string;
char *token;
token = strsep(&string, ",");
strtok is used in your normal intro to C course - use strsep, it's much better. :-)
No getting confused on "oh shit - i have to pass in NULL still cuz strtok screwed up my positioning."