Seg Fault when reading simple CSV file - C - c

I am reading a 2 columned csv file into an array of structs:
struct unused_s{
char col1[MAX_ARG_LENGTH];
char col2[MAX_ARG_LENGTH];
};
struct unused_s unused[MAX_USEABLE];
But I am getting a "Segmentation fault: 11" during execution. I have tried my best to debug this myself through reallocation of memory but I'm afraid my abilities are not up to the task. I have, however, pinpointed that the error is occuring somewhere in this section of code:
void readCSV(FILE *file){
int i = 0;
char line[MAX_LINE_LENGTH];
while (fgets(line, 1024, file))
{
char* tmp = strdup(line);
strcpy(unused[i].col1, getunused(tmp, FIRST_COLUMN));
strcpy(unused[i].col2, getunused(tmp, SECOND_COLUMN));
free(tmp);
i++;
}
fclose(file);
}
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
Any help solving this/pointing me in the right direction to solve this myself would be greatly appreciated!

As noted in the comments by John3136, you are returning NULL from getunused(), e.g.
const char* getunused(char* line, int n)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--n)
return tok;
}
return NULL;
}
From your calls to strtok, it appears you have an input file that will result in tmp similar to:
tmp = "somevalue; othervalue\n"
After your 1st call to getunused(), strtok will have replaced each delimiter in tmp with a nul-character in order to tokenize the string, so tmp will now contain:
tmp = "somevalue\0 othervalue\0"
When you call getunused(tmp, SECOND_COLUMN) (where SECOND_COLUMN is presumably 2), !--n tests false and NULL is returned.
Why Tokenize?
Rarely will you need to tokenize fields from a .csv file (or in your case a semi-colon separated file) Why? That is the whole purpose of a separated values file -- so you can read the file as input using a formatted input function to separate the fields rather than tokenizing on delimiters. (which you can do -- it's just not generally necessary). In your case, if your .csv file format is as set out above, then you can eliminate getunused entirely and simply use sscanf to separate the input strings, e.g.
void readCSV (FILE *file) {
int i = 0;
while (fgets(line, 1024, file))
if (sscanf (line "%49[^;] %49[^;\n]", unused[i].col1, unused[i].col2) == 2)
i++;
fclose(file);
}
(note: as in my comment, you should include the field-width modifier of MAX_ARG_LENGTH-1 (the number) as part of your format-specifier -- as edited above after your last comment)
Also, if your second value is terminated by a '\n', then drop the ';' from the character class, e.g. %49[^\n] will do for the 2nd value.

Related

Splitting Strings from file and putting them into array causes program crash

I am trying to read a file line by line and split it into words. Those words should be saved into an array. However, the program only gets the first line of the text file and when it tries to read the new line, the program crashes.
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {
token = strtok(NULL, " ");
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
After good answer from xing I decided to write my FULL simple program realizing your task and tell something about my solution. My program reads line-by-line a file, given as input argument and saves next lines into a buffer.
Code:
#include <assert.h>
#include <errno.h>
#define _WITH_GETLINE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define assert_msg(x) for ( ; !(x) ; assert(x) )
int
main(int argc, char **argv)
{
FILE *file;
char *buf, *token;
size_t length, read, size;
assert(argc == 2);
file = fopen(argv[1], "r");
assert_msg(file != NULL) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
token = NULL;
length = read = size = 0;
while ((read = getline(&token, &length, file)) != -1) {
token[read - 1] = ' ';
size += read;
buf = realloc(buf, size);
assert(buf != NULL);
(void)strncat(buf, token, read);
}
printf("%s\n", buf);
fclose(file);
free(buf);
free(token);
return (EXIT_SUCCESS);
}
For file file.txt:
that is a
text
which I
would like to
read
from file.
I got a result:
$ ./program file.txt
that is a text which I would like to read from file.
Few things which is worth to say about that solution:
Instead of fgets(3) I used getline(3) function because of easy way to knowledge about string length in line (read variable) and auto memory allocation for got string (token). It is important to remember to free(3) it. For Unix-like systems getline(3) is not provided by default in order to avoid compatibility problems. Therefore, #define _WITH_GETLINE macro is used before <stdio.h> header to make that function available.
buf contains only mandatory amount of space needed to save string. After reading one line from file buf is extended by the required amount of space by realloc(3). Is it a bit more "universal" solution. It is important to remember about freeing objects allocated on heap.
I also used strncat(3) which ensures that no more than read characters (length of token) would be save into buf. It is also not the best way of using strncat(3) because we also should testing a string truncation. But in general it is better than simple using of strcat(3) which is not recommended to use because enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack. strcat(3) and strncat(3) also adds terminating \0.
A getline(3) returns token with a new line character so I decided to replace it from new line to space (in context of creating sentences from words given in file). I also should eliminate last space but I do not wanted to complicate a source code.
From not mandatory things I also defined my own macro assert_msg(x) which is able to run assert(3) function and shows a text message with error. But it is only a feature but thanks to that we are able to see error message got during wrong attempts open a file.
The problem is getting the next token in the inner while loop and passing the result to strcpy without any check for a NULL result.
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {//not at the end of the line. yet!
token = strtok(NULL, " ");//get next token. but token == NULL at end of line
//passing NULL to strcpy is a problem
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
By incorporating the check into the while condition, passing NULL as the second argument to strcpy is avoided.
while ( ( token = strtok ( NULL, " ")) != NULL) {//get next token != NULL
//if token == NULL the while block is not executed
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
Sanitize your loops, and don't repeat yourself:
#include <stdio.h>
#include <string.h>
int main(void)
{
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
for(i=0; fgets(buf,sizeof(buf),inputfile); ) {
for(token = strtok(buf, " "); token != NULL; token = strtok(NULL, " ")){
strcpy(words[i++], token);
}
}
return 0;
}

Reading CSV from text file in C

I'm trying to read CSV from a text file in C. The text file format is
1,Bob,bob#gmail.com
2,Daniel,daniel#gmail.com
3,John,john#gmail.com
When I run the program, the number displays fine but the name and email are being displayed as garbage. Here is my program...
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int number;
char* name;
char* email;
} Owner;
Owner owners[100];
int load(char* filename)
{
char buffer[200];
char token[50];
Owner* owner;
int owners_size = 0;
FILE* file = fopen(filename, "r");
while(fgets(buffer, 200, file) != NULL)
{
owner = (Owner*)malloc(sizeof(Owner));
owner->number = atoi(strtok(buffer, ","));
owner->name = strtok(NULL, ",");
owner->email = strtok(NULL, ",");
owners[owners_size++] = *owner;
}
fclose(file);
return owners_size;
}
int main()
{
int choise, owners_size, index;
char* owners_filename = "owners2.txt";
owners_size = load(owners_filename);
if(owners_size)
{
printf("owners size: %d\n\n", owners_size);
for(index = 0; index < owners_size; index++)
printf("%d, %s %s\n", owners[index].number, owners[index].name, owners[index].email);
}
}
Can anyone tell me what the reason is. I appreciate your help.
Two problems:
You didn't allocate space for the strings in the structure:
typedef struct
{
int number;
char *name;
char *email;
} Owner;
You need to provide space for those pointers to point at to hold the names.
You keep on supplying pointers to the buffer which is reused for each line of input:
while(fgets(buffer, 200, file) != NULL)
{
owner = (Owner*)malloc(sizeof(Owner));
owner->number = atoi(strtok(buffer, ","));
owner->name = strtok(NULL, ",");
owner->email = strtok(NULL, ",");
owners[owners_size++] = *owner;
}
The first line gets stored as some pointers into the buffer. The next line then overwrites the buffer and chops the line up again, trampling all over the original input.
Consider using strdup():
while (fgets(buffer, 200, file) != NULL)
{
owner = (Owner *)malloc(sizeof(Owner));
owner->number = atoi(strtok(buffer, ","));
owner->name = strdup(strtok(NULL, ","));
owner->email = strdup(strtok(NULL, ","));
owners[owners_size++] = *owner;
}
This is slightly dangerous code (I'd not use it in production code) because it doesn't check that strtok() found a token when expected (or that strdup() was successful). There again, I wouldn't use strtok() in production code either; I'd use POSIX strtok_r() or Microsoft's strtok_s() if they were available, or some alternative technique, probably using strspn() and strcspn(). If strdup() is not available, you can write your own, with the same or a different name:
char *strdup(const char *str)
{
size_t len = strlen(str) + 1;
char *dup = malloc(len);
if (dup != 0)
memmove(dup, str, len); // Or memcpy() - that is safe in this context
return(dup);
}
You might note that your code is only suitable for simple CSV files. If you encountered a line like this (which is legitimate CSV), you'd have problems (with quotes in your values, and mis-splitting because of the comma inside the quoted string):
1,"Bob ""The King"" King","Bob King, Itinerant Programmer <bob#gmail.com>"
The pointer returned by strtok() points to an address within the buffer it is parsing, in this case the local variable buffer. When load() returns the variable it is out of scope (even if it wasn't all instances of owners would be pointing the same address). You need to copy the string returned by strtok(). You could use strdup() if available or use malloc() and strcpy().
There is no need to malloc() new instances of Owner as an array of them already exist (the code as is stands has a memory leak).
Note there is no protection against going beyond the bounds of the owners array. If the file has more than 100 entries then the loop will go beyond the bounds of the array. Extend the terminating condition of the while to prevent this:
while(owners_size < sizeof(owners) / sizeof(owners[0]) &&
fgets(buffer, 200, file) != NULL)
{
}
You just stored pointers into a local buffer. When you leave load() this buffer is gone and not accessible anymore.
You must allocate memory for name and email before you can copy it into the Owner struct.
char *tok;
tok = strtok(NULL, ",");
len = strlen(tok);
owner->name = malloc(len + 1);
strcpy(owner->name, tok);
...
[EDIT: you need to allocate len+1 bytes so you have space for the terminating NUL character. -Zack]
You've only got one line buffer. Every cycle of the loop in load clobbers the text from the previous cycle. And if that wasn't bad enough, the buffer is destroyed when load returns.
The quick fix is to change
owner->name = strtok(NULL, ",");
owner->email = strtok(NULL, ",");
to
owner->name = strdup(strtok(NULL, ","));
owner->email = strdup(strtok(NULL, ","));
(If you don't have strdup, get a real computer it's very simple to write.)
If I were reviewing your code, though, I would ding you for the fixed-size line buffer, the fixed-size owners array, the memory leak, using atoi instead of strtol, using strtok instead of strsep, and the absence of quote handling and parse error recovery, and point out that it would be more efficient to allocate each line as a unit and then save pointers into it.

How would i Use strtok to compare word by word

I've been reading up on strtok and thought it would be the best way for me to compare two files word by word. So far i can't really figure out how i would do it though
Here is my function that perfoms it:
int wordcmp(FILE *fp1, FILE *fp2)
{
char *s1;
char *s2;
char *tok;
char *tok2;
char line[BUFSIZE];
char line2[BUFSIZE];
char comp1[BUFSIZE];
char comp2[BUFSIZE];
char temp[BUFSIZE];
int word = 1;
size_t i = 0;
while((s1 = fgets(line,BUFSIZE, fp1)) && (s2 = fgets(line2,BUFSIZE, fp2)))
{
;
}
tok = strtok(line, " ");
tok2 = strtok(line, " ");
while(tok != NULL)
{
tok = strtok (NULL, " ");
}
return 0;
}
Don't mind the unused variables, I've been at this for 3 hours and have tried all possible ways I can think of to compare the values of the first and second strtok. Also I would to know how i would check which file reaches EOF first.
when i tried
if(s1 == EOF && s2 != EOF)
{
return -1;
}
It returns -1 even when the files are the same! Is it because in order for it to reach the if statement outside of the loop both files have reached EOF which makes the program always go to this if statement?
Thanks in advance!
If you want to check if files are same try doing,
do {
s1 = fgetc(fp1);
s2 = fgetc(fp2);
if (s1 == s2) {
if (s1 == EOF) {
return 1; // RETURN TRUE
}
continue;
}
else {
return -1; // RETURN FALSE
}
} while (1);
Good Luck :)
When you use strtok() you typically use code like this:
tok = strtok(line, " ");
while (NULL != tok)
{
tok = strtok(NULL, " ");
}
The NULL in the call in the loop tells strtok to continue from after the previously found token until it finds the null terminating character in the value you originally passed (line) or until there are no more tokens. The current pointer is stored in the run time library, and once strtok() returns NULL to indicate no more tokens any more calls to strtok() using NULL as the first parameter (to continue) will result in NULL. You need to call it with another value (e.g. another call to strtok(line, " ")) to get it to start again.
What this means is that to use strtok on two different strings at the same time you need to manually update the string position and pass in a modified value on each call.
tok = strtok(line, " ");
tok2 = strtok(line2, " ");
while (NULL != tok && NULL != tok2)
{
/* Do stuff with tok and tok2 here */
if (strcmp(tok, tok2)... {}
/* Update strtok pointers */
tok += strlen(tok) + 1;
tok2 += strlen(tok2) + 1;
/* Get next token */
tok = strtok(tok, " ");
tok2 = strtok(tok2, " ");
}
You'll still need to add logic for determining whether lines are different - you've not said whether the files are equivalent if a line break occurs at different position but the words surrounding it are the same. I assume it should be, given your description, but it makes the logic more awkward as you only need to perform the initial fgets() and strtok() for a file if you don't already have a token. You also need to look at how files are read in. Currently your first while loop just reads lines until the end of the file without processing them.

tokenizing a string twice in c with strtok()

I'm using strtok() in c to parse a csv string. First I tokenize it to just find out how many tokens there are so I can allocate a string of the correct size. Then I go through using the same variable I used last time for tokenization. Every time I do it a second time though it strtok(NULL, ",") returns NULL even though there are still more tokens to parse. Can somebody tell me what I'm doing wrong?
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
//allocate array
tok = strtok(buffer, ",");
while(tok != NULL) {
//do other stuff
tok = strtok(NULL, ",");
}
So on that second while loop it always ends after the first token is found even though there are more tokens. Does anybody know what I'm doing wrong?
strtok() modifies the string it operates on, replacing delimiter characters with nulls. So if you want to use it more than once, you'll have to make a copy.
There's not necessarily a need to make a copy - strtok() does modify the string it's tokenizing, but in most cases that simply means the string is already tokenized if you want to deal with the tokens again.
Here's your program modified a bit to process the tokens after your first pass:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int i;
char buffer[] = "some, string with , tokens";
char* tok;
int count = 0;
tok = strtok(buffer, ",");
while(tok != NULL) {
count++;
tok = strtok(NULL, ",");
}
// walk through the tokenized buffer again
tok = buffer;
for (i = 0; i < count; ++i) {
printf( "token %d: \"%s\"\n", i+1, tok);
tok += strlen(tok) + 1; // get the next token by skipping past the '\0'
tok += strspn(tok, ","); // then skipping any starting delimiters
}
return 0;
}
Note that this is unfortunately trickier than I first posted - the call to strspn() needs to be performed after skipping the '\0' placed by strtok() since strtok() will skip any leading delimiter characters for the token it returns (without replacing the delimiter character in the source).
Use strsep - it actually updates your pointer. In your case you would have to keep calling NULL versus passing in the address of your string. The only issue with strsep is if it was previously allocated on the heap, keep a pointer to the beginning and then free it later.
char *strsep(char **string, char *delim);
char *string;
char *token;
token = strsep(&string, ",");
strtok is used in your normal intro to C course - use strsep, it's much better. :-)
No getting confused on "oh shit - i have to pass in NULL still cuz strtok screwed up my positioning."

Search for a string in a text file and parse that line (Linux, C)

This is "how to parse a config file" question.
Basically i have a text file (/etc/myconfig) that has all kind of settings. I need to read that file and search for the string:
wants_return=yes
once I locate that string I need to parse it and return only whatever it is after the equal sign.
I've tried using a combinations of fgets and strtok but I'm getting confused here.
in any case anyone knows a function that can perform this?
Code is appreciated.
thanks
This works: (note: I'm unsure if fgets is supposed to include the newline character in the returned string; if it isn't, you can drop the check for it)
#include <stdio.h>
const unsigned MAXLINE=9999;
char const* FCFG="/etc/myconfig";
char const* findkey="wants_return=";
char * skip_ws(char *line)
{
return line+strspn(line," \t");
}
char * findval(char *line,char const* prefix,int prelen)
{
char *p;
p=skip_ws(line);
if (strncmp(p,prefix,prelen)==0)
return p+prelen;
else
return NULL;
}
char *findval_slow(char *line,char const* prefix)
{
return findval(line,prefix,strlen(prefix));
}
int main() {
FILE *fcfg;
char line[MAXLINE];
char *p,*pend;
int findlen;
findlen=strlen(findkey);
fcfg=fopen(FCFG,"r");
while (p=fgets(line,MAXLINE,fcfg)) {
printf("Looking at %s\n",p);
if (p=findval(line,findkey,findlen)) {
pend=p+strlen(p)-1; /* check last char for newline terminator */
if (*pend=='\n') *pend=0;
printf("Found %s\n",p); /* process/parse the value */
}
}
return 0;
}
Here's a quick example using strtok:
const int linelen = 256;
char line[linelen];
FILE* fp = fopen(argv[1], "r");
if (fp == NULL) {
perror("Error opening file");
} else {
while (! feof(fp)) {
if (fgets(line, linelen , fp)) {
const char* name = strtok(line, "= \r\n");
const char* value = strtok(NULL, "= \r\n");
printf("%s => %s\n", name, value);
}
}
fclose (fp);
}
Note, you'll need to put some additional error checking around it, but this works to parse the files I threw at it.
From your comment, it looks like you're already getting the appropriate line from the text file using fgets and loading it into a character buffer. You can use strtok to parse the tokens from the line.
If you run it with the string buffer as the first argument, it will return the first token from that string. If you run the same command with the first argument set to NULL it will return subsequent tokens from the same original string.
A quick example of how to retrieve multiple tokens:
#include <stdio.h>
#include <string.h>
int main() {
char buffer[17]="wants_return=yes";
char* tok;
tok = strtok(buffer, "=");
printf("%s\n", tok); /* tok points to "wants_return" */
tok = strtok(NULL, "=");
printf("%s\n", tok); /* tok points to "yes" */
return 0;
}
For the second strtok call, you can replace the "=" with "" to return everything to the end of the string, instead of breaking off at the next equal sign.
With a POSIX shell, I'd use something like:
answer=`egrep 'wants_config[ ]*=' /etc/myconfig | sed 's/^.*=[ ]*//'`
Of course, if you're looking for an answer that uses the C STDIO library, then you really need to review the STDIO documentation.

Resources