Splitting Strings from file and putting them into array causes program crash - c

I am trying to read a file line by line and split it into words. Those words should be saved into an array. However, the program only gets the first line of the text file and when it tries to read the new line, the program crashes.
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {
token = strtok(NULL, " ");
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}

After good answer from xing I decided to write my FULL simple program realizing your task and tell something about my solution. My program reads line-by-line a file, given as input argument and saves next lines into a buffer.
Code:
#include <assert.h>
#include <errno.h>
#define _WITH_GETLINE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define assert_msg(x) for ( ; !(x) ; assert(x) )
int
main(int argc, char **argv)
{
FILE *file;
char *buf, *token;
size_t length, read, size;
assert(argc == 2);
file = fopen(argv[1], "r");
assert_msg(file != NULL) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
token = NULL;
length = read = size = 0;
while ((read = getline(&token, &length, file)) != -1) {
token[read - 1] = ' ';
size += read;
buf = realloc(buf, size);
assert(buf != NULL);
(void)strncat(buf, token, read);
}
printf("%s\n", buf);
fclose(file);
free(buf);
free(token);
return (EXIT_SUCCESS);
}
For file file.txt:
that is a
text
which I
would like to
read
from file.
I got a result:
$ ./program file.txt
that is a text which I would like to read from file.
Few things which is worth to say about that solution:
Instead of fgets(3) I used getline(3) function because of easy way to knowledge about string length in line (read variable) and auto memory allocation for got string (token). It is important to remember to free(3) it. For Unix-like systems getline(3) is not provided by default in order to avoid compatibility problems. Therefore, #define _WITH_GETLINE macro is used before <stdio.h> header to make that function available.
buf contains only mandatory amount of space needed to save string. After reading one line from file buf is extended by the required amount of space by realloc(3). Is it a bit more "universal" solution. It is important to remember about freeing objects allocated on heap.
I also used strncat(3) which ensures that no more than read characters (length of token) would be save into buf. It is also not the best way of using strncat(3) because we also should testing a string truncation. But in general it is better than simple using of strcat(3) which is not recommended to use because enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack. strcat(3) and strncat(3) also adds terminating \0.
A getline(3) returns token with a new line character so I decided to replace it from new line to space (in context of creating sentences from words given in file). I also should eliminate last space but I do not wanted to complicate a source code.
From not mandatory things I also defined my own macro assert_msg(x) which is able to run assert(3) function and shows a text message with error. But it is only a feature but thanks to that we are able to see error message got during wrong attempts open a file.

The problem is getting the next token in the inner while loop and passing the result to strcpy without any check for a NULL result.
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {//not at the end of the line. yet!
token = strtok(NULL, " ");//get next token. but token == NULL at end of line
//passing NULL to strcpy is a problem
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
By incorporating the check into the while condition, passing NULL as the second argument to strcpy is avoided.
while ( ( token = strtok ( NULL, " ")) != NULL) {//get next token != NULL
//if token == NULL the while block is not executed
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}

Sanitize your loops, and don't repeat yourself:
#include <stdio.h>
#include <string.h>
int main(void)
{
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
for(i=0; fgets(buf,sizeof(buf),inputfile); ) {
for(token = strtok(buf, " "); token != NULL; token = strtok(NULL, " ")){
strcpy(words[i++], token);
}
}
return 0;
}

Related

File parsing in c, extract specific line if it exists

I have a dynamically updated text file with names of people, I want to parse the file to extract "Caleb" and the string that follows his name. However, his name may not always be in the list and I want to account for that.
I could do it in Java, but not even sure what to do in C. I could start by reading in the text file line by line, but then how would I check if "Caleb" is a substring of the string I just read in and handle the case when he isn't? I want to do this without using external libraries - what would be the best method?
Barnabas: Followed by a string
Bart: Followed by a string
Becky: Followed by a string
Bellatrix: Followed by a string
Belle: Followed by a string
Caleb: I want this string
Benjamin: Followed by a string
Beowul: Followed by a string
Brady: Followed by a string
Brick: Followed by a string
returns: "Caleb: I want this string" or "Name not found"
but then how would I check if "Caleb" is a substring of the string
The heart of the question as I read it. strstr does the job.
char *matchloc;
if ((matchloc = strstr(line, "Caleb:")) {
// You have a match. Code here.
}
However in this particular case you really want starts with Caleb, so we do better with strncmp:
if (!strncmp(line, "Caleb:", 6)) {
// You have a match. Code here.
}
So if you want to check if the user caleb exists, you can simple made a strstr, with your array of strings, and if exists you can make a strtok, to get only the string!
I dont know how you are opening the file, but you can use getline to get line by line!
You can do something like this:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
FILE *file;
char *fich="FILE.TXT";
char *line = NULL;
char *StringFile[100];
size_t len = 0;
ssize_t stringLength;
const char s[2] = ":"; //Divide string for this
char *token;
int check =0;
char *matchloc;
file=fopen(fich, "r");
if(file==NULL){
fprintf(stderr, "[ERROR]: cannot open file <%s> ", fich);
perror("");
exit(1);
}
while((stringLength = getline(&line, &len, file)) != -1){
if(line[strlen(line)-1] == '\n'){
line[strlen(line)-1] = '\0'; //Removing \n if exists
}
if((matchloc = strstr(line, "Caleb:"))){
check = 1;
strcpy(*StringFile, line);
token = strtok(*StringFile, s);
while( token != NULL ) {
token = strtok(NULL, s);
printf("%s\n", token);
break;
}
break;
}
}
if(check==0){
printf("Name not found\n");
}
return 0;
}
The code, can have some errors, but the idead is that! when founds the name, copy the line to array and the splits it.

C: strtok delivers segmentation fault

I am trying to read a file line by line, and tokenize each line, which have strings separated by spaces and tabs. However, when I run my program, I get the a Segmentation Fault error when I try to print the token. I don't understand why this is happening, as I am using a buffer as the string to tokenize and checking if the token is null. Below is my code:
#include <stdio.h>
#include <stdlib.h>
#define MAX_LINE_LENGTH 70
int main(void)
{
FILE * testFile;
char buf[MAX_LINE_LENGTH];
testFile = fopen("test_file.txt", "r");
if (testFile == NULL)
{
printf("Cannot open test_file.txt.\n");
exit(0);
}
while (fgets(buf, sizeof(buf), testFile) != NULL) {
char *token = strtok(buf," \t");
while (token != NULL)
{
token = strtok(NULL, " \t");
if (token != NULL) {
printf("%s\n", token);
}
}
}
exit(1);
}
Below is the contents of test_file.txt:
String1 String2 String3
String4 String5 String6
String7 String8 String9
Two helpful tips -- (1) enable compiler warnings, e.g. minimum -Wall -Wextra -pedantic for gcc/clang or /W3 for VS (any other compiler will have similar options), and do not accept code until it compiles without warning; (2) #include <string.h> where strtok is defined.
In addition to the lack of validation pointed out by #dreamer, you must be using an implicit definition for strtok. You should receive a compiler warning along those lines. Don't ignore any warning, instead go fix it, it will generally tell you the exact line the problem code is on.
Next, don't hardcode filenames. It is just as simple to pass the filename as the first argument to your program (or read from stdin by default). Your second option is to take the filename as input to your program.
Putting those together, you could do something simple like:
#include <stdio.h>
#include <string.h>
#define MAX_LINE_LENGTH 70
#define DELIM " \t\n"
int main (int argc, char **argv) {
char buf[MAX_LINE_LENGTH];
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (buf, sizeof buf, fp))
for (char *p = strtok(buf, DELIM); p; p = strtok(NULL, DELIM))
puts (p);
if (fp != stdin) /* close file if not stdin */
fclose (fp);
return 0;
}
(note: you need to include '\n' as a delimiter character to prevent the additional '\n' from being part of the last token in each line)
Example Use/Output
$ ./bin/strtokfile test_file.txt
String1
String2
String3
String4
String5
String6
String7
String8
String9
Look things over and let me know if you have questions.
Looks like you are printing without checking for NULL for token pointer.
If you need to print all tokens you also need to print inside while loop after strtok system call with addition to non-NULL check for token.

Previously stored strings are overwritten by fgets

I'm reading records from a CSV file using fgets() to read the file one line at a time, and strtok() to parse the fields in each line. I'm encountering a problem where fgets() overwrites a string that was previously written, in favor of the new string.
Here's an example of what I mean by that:
record.csv (This is the file I'm reading in)
John,18
Johann,29
main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct customer {
char *name;
int age;
} Customer;
int main(void)
{
FILE *csv_data;
char line[100], *token;
Customer newData[2];
csv_data = fopen("record.csv", "r");
// Index 0 for John's data, index 1 for Johann's data
int i = 0;
/* loops until end of file */
while(fgets(line, 100, csv_data)) {
/* name field */
token = strtok(line, ",");
if (token != NULL) {
newData[i].name = token;
}
/* age field */
token = strtok(NULL, ",");
if (token != NULL) {
// atoi() converts ascii char to integer
newData[i].age = atoi(token);
}
i++;
}
/* print John's records */
printf("%s\n", newData[0].name);
printf("%d\n", newData[0].age);
/* print Johann's records */
printf("%s\n", newData[1].name);
printf("%d\n", newData[1].age);
return 0;
}
When we compile and execute this, it prints out:
Johann
18
Johann
29
"John" in newData[0].name gets overwritten with "Johann" during the second iteration of the while loop. Notice however that only the strings get mixed up, but not the integers. I suspect this has to do with fgets because when I modified the above source to only run fgets once, the output for "John" was as it should be.
Maybe I'm misusing fgets (or perhaps my assumption is wrong), but could someone give me some pointers on why the strings are being overwritten with each calls to fgets?
Second Update: Thank-you very much again to all the commenters and the answerers. Good to know those things which I was not aware of. The source works perfect now.
You are not copy the string but the pointer to the string.
A very simple way to copy the string, but note that this limit the size of the string at 99 characters.
typedef struct customer {
char name[100];
int age;
} Customer;
strcpy(newData[i].name, token);
Do:
newData[i].name = malloc( strlen( token ) + 1 );
strcpy( newData[i].name, token );
or define name member as char name[64]; and then again strcpy( newData[i].name, token ); without malloc. The 64 bytes for name can be more or less.

C - Read non-alphabetic chars as word boundary

I'm trying to parse in a text file, and add each distinct word into a hashtable, with the words as keys, and their frequencies as values. The problem is proving to be the reading part: the file is a very large file of "normal" text, in that it has punctuation and special characters. I want to treat all non-alphabetical chars read in as word-boundaries. I have something basic going with this:
char buffer[128];
while(fscanf(fp, "%127[A-Za-z]%*c", buffer) == 1) {
printf("%s\n", buffer);
memset(buffer, 0, 128);
}
However, that chokes whenever it actually hits a non-alphabetical char preceded by whitespace (e.g., "the,cat was (brown)" would be read in as "the cat was"). I know what the issue is with that code, but I'm not sure how to get around it. Would I be better off just reading in an entire line and doing the parsing manually? I'm trying scanf because I felt that this was a pretty good candidate for the mini-regex thing that you can do with the format string.
Suggest use of isalpha(), fgetc() and a simple state-machine.
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
int AdamRead(FILE *inf, char *dest, size_t n) {
int ch;
do {
ch = fgetc(inf);
if (ch == EOF) return EOF;
} while (!isalpha(ch));
assert(n > 1);
n--; // save room for \0
while (n-- > 0) {
*dest++ = ch;
ch = fgetc(inf);
if (!isalpha(ch)) break;
}
ungetc(ch, inf); // Add this is something else may need to parse `inf`.
*dest = '\0';
return 1;
}
char buffer[128];
while(AdamRead(fp, buffer, sizeof buffer) == 1) {
printf("%s\n", buffer);
}
Note: If you want to go the "%127[A-Za-z]%*[^A-Za-z]" route, code may need to start with a one-time fscanf(fp, "*[^A-Za-z]"); to deal with leading non-letters.
There's another way apart from the one mentioned in the comment. I don't know if it's better though. You can read lines from the file using fgets and then tokenize the line using strtok_r POSIX function. Here, r means the function is reentrant which makes it thread-safe. However, you must know the maximum length a line can have in the file.
#include <stdio.h>
#include <string.h>
#define MAX_LEN 100
// in main
char line[MAX_LEN];
char *token;
const char *delim = "!##$%^&*"; // all special characters
char *saveptr; // for strtok_r
FILE *fp = fopen("myfile.txt", "r");
while(fgets(line, MAX_LEN, fp) != NULL) {
for(; ; line = NULL) {
token = strtok_r(line, delim, &saveptr);
if(token == NULL)
break;
else {
// token is a string.
// process it
}
}
}
fclose(fp);
strtok_r modifies its first argument line, so you should keep a copy of it if it needed for other purposes.

Search for a string in a text file and parse that line (Linux, C)

This is "how to parse a config file" question.
Basically i have a text file (/etc/myconfig) that has all kind of settings. I need to read that file and search for the string:
wants_return=yes
once I locate that string I need to parse it and return only whatever it is after the equal sign.
I've tried using a combinations of fgets and strtok but I'm getting confused here.
in any case anyone knows a function that can perform this?
Code is appreciated.
thanks
This works: (note: I'm unsure if fgets is supposed to include the newline character in the returned string; if it isn't, you can drop the check for it)
#include <stdio.h>
const unsigned MAXLINE=9999;
char const* FCFG="/etc/myconfig";
char const* findkey="wants_return=";
char * skip_ws(char *line)
{
return line+strspn(line," \t");
}
char * findval(char *line,char const* prefix,int prelen)
{
char *p;
p=skip_ws(line);
if (strncmp(p,prefix,prelen)==0)
return p+prelen;
else
return NULL;
}
char *findval_slow(char *line,char const* prefix)
{
return findval(line,prefix,strlen(prefix));
}
int main() {
FILE *fcfg;
char line[MAXLINE];
char *p,*pend;
int findlen;
findlen=strlen(findkey);
fcfg=fopen(FCFG,"r");
while (p=fgets(line,MAXLINE,fcfg)) {
printf("Looking at %s\n",p);
if (p=findval(line,findkey,findlen)) {
pend=p+strlen(p)-1; /* check last char for newline terminator */
if (*pend=='\n') *pend=0;
printf("Found %s\n",p); /* process/parse the value */
}
}
return 0;
}
Here's a quick example using strtok:
const int linelen = 256;
char line[linelen];
FILE* fp = fopen(argv[1], "r");
if (fp == NULL) {
perror("Error opening file");
} else {
while (! feof(fp)) {
if (fgets(line, linelen , fp)) {
const char* name = strtok(line, "= \r\n");
const char* value = strtok(NULL, "= \r\n");
printf("%s => %s\n", name, value);
}
}
fclose (fp);
}
Note, you'll need to put some additional error checking around it, but this works to parse the files I threw at it.
From your comment, it looks like you're already getting the appropriate line from the text file using fgets and loading it into a character buffer. You can use strtok to parse the tokens from the line.
If you run it with the string buffer as the first argument, it will return the first token from that string. If you run the same command with the first argument set to NULL it will return subsequent tokens from the same original string.
A quick example of how to retrieve multiple tokens:
#include <stdio.h>
#include <string.h>
int main() {
char buffer[17]="wants_return=yes";
char* tok;
tok = strtok(buffer, "=");
printf("%s\n", tok); /* tok points to "wants_return" */
tok = strtok(NULL, "=");
printf("%s\n", tok); /* tok points to "yes" */
return 0;
}
For the second strtok call, you can replace the "=" with "" to return everything to the end of the string, instead of breaking off at the next equal sign.
With a POSIX shell, I'd use something like:
answer=`egrep 'wants_config[ ]*=' /etc/myconfig | sed 's/^.*=[ ]*//'`
Of course, if you're looking for an answer that uses the C STDIO library, then you really need to review the STDIO documentation.

Resources