Reading a text file into 2 separate arrays of characters (in C) - c

For a class I have to write a program to read in a text file in the format of:
T A E D Q Q
Z H P N I U
C K E W D I
V U X O F C
B P I R G K
N R T B R B
EXIT
THE
QUICK
BROWN
FOX
I'm trying to get the characters into an array of chars, each line being its own array.
I'm able to read from the file okay, and this is the code I use to parse the file:
char** getLinesInFile(char *filepath)
{
FILE *file;
const char mode = 'r';
file = fopen(filepath, &mode);
char **textInFile;
/* Reads the number of lines in the file. */
int numLines = 0;
char charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
numLines++;
}
charRead = fgetc(file);
}
fseek(file, 0L, SEEK_SET);
textInFile = (char**) malloc(sizeof(char*) * numLines);
/* Sizes the array of text lines. */
int line = 0;
int numChars = 1;
charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
line++;
numChars = 0;
}
else if(charRead != ' ')
{
numChars++;
}
charRead = fgetc(file);
}
/* Fill the array with the characters */
fseek(file, 0L, SEEK_SET);
charRead = fgetc(file);
line = 0;
int charNumber = 0;
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
line++;
charNumber = 0;
}
else if(charRead != ' ')
{
textInFile[line][charNumber] = charRead;
charNumber++;
}
charRead = fgetc(file);
}
return textInFile;
}
This is a run of my program:
Welcome to Word search!
Enter the file you would like us to parse:testFile.txt
TAEDQQ!ZHPNIU!CKEWDI!VUXOFC!BPIRGK!NRTBRB!EXIT!THE!QUICK!BROWN!FOX
Segmentation fault
What's going on? A), why are the exclamation marks there, and B) why do I get a seg fault at the end? The last thing I do in the main is iterate through the array/pointers.

1) In the first part of your program, you are miscounting the number of lines in the file. The actual number of lines in the file is 11, but your program gets 10. You need to start counting from 1, as there will always be at least one line in the file. So change
int numLines = 0;
to
int numLines = 1;
2) In the second part of the program you are miscounting the number of characters on each line. You need to keep your counter initializations the same. At the start of the segment you initialize numChars to 1. In that case you need to reset your counter to 1 after each iteration, so change:
numChars = 0;
to
numChars = 1;
This should provide enough space for all the non-space characters and for the ending NULL terminator. Keep in mind that in C char* strings are always NULL terminated.
3) Your program also does not account for differences in line termination, but under my test environment that is not a problem -- fgetc returns only one character for the line terminator, even though the file is saved with \r\n terminators.
4) In the second part of your program, you are also not allocating memory for the very last line. This causes your segfault in the third part of your program when you try to access the unallocated space.
Note how your code only saves lines if they end in \r or \n. Guess what, EOF which technically is the line ending for the last line does not qualify. So your second loop does not save the last line into the array.
To fix this, add this after the second part:
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
4) In your program output you are seeing those weird exclamation points because you are not NULL terminating your strings. So you need to add the line marked as NULL termination below:
if(charRead == '\n' || charRead == '\r')
{
textInFile[line][charNumber] = 0; // NULL termination
line++;
charNumber = 0;
}
5) Because you are checking for EOF, you have the same problem in your third loop, so you must add this before the return
textInFile[line][charNumber] = 0; // NULL termination
6) I am also getting some headaches because of the whole program structure. You read the same file character by character 3 times! This is extremely slow and inefficient.
Fixed code follows below:
char** getLinesInFile(char *filepath)
{
FILE *file;
const char mode = 'r';
file = fopen(filepath, &mode);
char **textInFile;
/* Reads the number of lines in the file. */
int numLines = 1;
char charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
numLines++;
}
charRead = fgetc(file);
}
fseek(file, 0L, SEEK_SET);
textInFile = (char**) malloc(sizeof(char*) * numLines);
/* Sizes the array of text lines. */
int line = 0;
int numChars = 1;
charRead = fgetc(file);
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
line++;
numChars = 1;
}
else if(charRead != ' ')
{
numChars++;
}
charRead = fgetc(file);
}
textInFile[line] = (char*) malloc(sizeof(char) * numChars);
/* Fill the array with the characters */
fseek(file, 0L, SEEK_SET);
charRead = fgetc(file);
line = 0;
int charNumber = 0;
while (charRead != EOF)
{
if(charRead == '\n' || charRead == '\r')
{
textInFile[line][charNumber] = 0; // NULL termination
line++;
charNumber = 0;
}
else if(charRead != ' ')
{
textInFile[line][charNumber] = charRead;
charNumber++;
}
charRead = fgetc(file);
}
textInFile[line][charNumber] = 0; // NULL termination
return textInFile;
}

You aren't null terminating your arrays. This probably explains both problems. Be sure to allocate an extra character for the null terminator.

Do This:
if(charRead == '\n')
{
textInFile[line] = (char*) malloc(sizeof(char) * (numChars+1));
line++;
numChars = 0;
}
Then:
if(charRead == '\n')
{
textInFile[line][charNumber]='\0';
line++;
charNumber = 0;
}
Also you are reading the file 3 times! This thread has some good explanation on how to read a file efficiently.

Related

Having trouble checking for newline when reading file

I'm trying to write a program to read a file of an unknown size / line size but I'm having some issues detecting the new line character.
When I run the program, it never reaches the end of the line point within the while loop in readFile and will just run constantly. If I run print each character, it prints out some unknown char.
I've tried setting ch to be an int value and typecasting to char for \n comparison. It's not reaching the EOF condition either so I'm not sure what is going on.
code:
void readFile(FILE* file)
{
int endOfFile = 0;
while (endOfFile != 1)
{
endOfFile = readLine(file);
printf("%d\n", endOfFile);
}
}
int readLine(FILE* file)
{
static int maxSize = LINE_SIZE;
int currentIndex = 0;
int endOfFile = 0;
char* buffer = (char*) malloc(sizeof(char) * maxSize);
char ch;
do
{
ch = fgetc(file);
if ((ch != EOF) || (ch != '\n'))
{
buffer[currentIndex] = (char) ch;
currentIndex += 1;
}
if (currentIndex == maxSize)
{
printf("Reallocating string buffer");
maxSize *= 2;
buffer = (char*) realloc(buffer, maxSize);
}
} while ((ch != EOF) || (ch != '\n'));
if (ch == EOF)
{
endOfFile = 1;
}
parseLine(buffer);
free(buffer);
return endOfFile;
}
If someone could help me that would be greatly appreciated because I have been stuck on this issue for quite some time. Thanks in advance.
(ch != EOF) || (ch != '\n')
This is always true.
You want an && (AND) here, both in your if and while, otherwise it will never stop.
Just use this boilerplate standard construct
int ch; /* important, EOF is -1, not in the range 0-255 */
FILE *fp;
/* double brackets prevent warnings about assignment in if */
while ( (ch = fgetc(fp)) != EOF)
{
/* we now have a valid character */
/* usually */
if(ch == endofinputIlike)
break;
}
/* here you have either read all the input up to what you like
or skip because of EOF. usually you will set N or something,
or if N == 0 it was EOF, or we can test ch for EOF */
Generally assignment in if is a bad idea, but this particular snippet is so idiomatic that every experienced C programmer will instantly recognise it.

how to read input string until a blank line in C?

first of all i'm new to coding in C.
I tried to read a string of unknowns size from the user until a blank line is given and then save it to a file, and after that to read the file.
I've only managed to do it until a new line is given and I don't know how to look for a blank line.
#include <stdio.h>
#include <stdlib.h>
char *input(FILE* fp, size_t size) {
char *str;
int ch;
size_t len = 0;
str = realloc(NULL, sizeof(char)*size);
if (!str)return str;
while (EOF != (ch = fgetc(fp)) && ch != '\n') {
str[len++] = ch;
if (len == size) {
str = realloc(str, sizeof(char)*(size += 16));
if (!str)return str;
}
}
str[len++] = '\0';
return realloc(str, sizeof(char)*len);
}
int main(int argc, const char * argv[]) {
char *istr;
printf("input string : ");
istr = input(stdin, 10);
//write to file
FILE *fp;
fp = fopen("1.txt", "w+");
fprintf(fp, istr);
fclose(fp);
//read file
char c;
fp = fopen("1.txt", "r");
while ((c = fgetc(fp)) != EOF) {
printf("%c", c);
}
printf("\n");
fclose(fp);
free(istr);
return 0;
}
Thanks!
I would restructure your code a little. I would change your input() function to be a function (readline()?) that reads a single line. In main() I would loop reading line by line via readline().
If the line is empty (only has a newline -- use strcmp(istr, "\n")), then free the pointer, and exit the loop. Otherwise write the line to the file and free the pointer.
If your concept of an empty line includes " \n" (prefixed spaces), then write a function is_only_spaces() that returns a true value for a string that looks like that.
While you could handle the empty line in input(), there is value in abstracting the line reading from the input termination conditions.
Why not use a flag or a counter. For a counter you could simply increase the counter each character found. If a new line is found and the counter is 0 it must be a blank line. If a new line character is found and the counter is not 0, it must be the end of the line so reset the counter to 0 and continue.
Something like this:
int count = 0;
while ((ch = fgetc(fp)) != EOF)
{
if(ch == '\n')
{
if(count == 0)
{
break;
}
count = 0;
str[len++] = ch;
}
else
{
str[len++] = ch;
ch++;
}
}
Another way would be to simply check if the last character in the string was a new line.
while ((ch = fgetc(fp)) != EOF)
{
if(ch == '\n' && str[len - 1] == '\n')
{
break;
}
}
A blank line is a line which contains only a newline, right ? So you can simply keep the last 2 characters you read. If they are '\n', then you have detected a blank line : the first '\n' is the end of the previous line, the second one is the end of the current line (which is a blank line).
char *input(FILE* fp, size_t size) {
char *str;
int ch, prev_ch;
size_t len = 0;
str = realloc(NULL, sizeof(char)*size);
if (!str)return str;
while (EOF != (ch = fgetc(fp)) && (ch != '\n' && prev_ch != '\n')) {
str[len++] = ch;
if (len == size) {
str = realloc(str, sizeof(char)*(size += 16));
if (!str)return str;
}
prev_ch = ch;
}
str[len++] = '\0';
return realloc(str, sizeof(char)*len);
}
Note that parenthesis around ch != '\n' && prev_ch != '\n' are here to make the condition more understandable.
To improve this, you can keep your function that reads only a line and test if the line returned is empty (it contains only a '\n').

C program to count total words in an input file

Input file contains a completely empty line at line 2 and an unnecessary white space after the final full stop of the text. With this input file I am getting 48 words while I was suppose to get 46 words.
My input file contains:
"Opening from A Tale of Two Cities by Charles Darwin
It was the best of times, it was the worst of times. It was the age
of wisdom, it was the age of foolishness. It was the epoch of
belief, it was the epoch of incredulity. "
Here's how I tried:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define max_story_words 1000
#define max_word_length 80
int main (int argc, char **argv)
{
char story[max_story_words][max_word_length] = {{0}};
char line[max_story_words] = {0};
char *p;
char ch = 0;
char *punct="\n ,!.:;?-";
int num_words = 1;
int i = 0;
FILE *file_story = fopen ("TwoCitiesStory.txt", "r");
if (file_story==NULL) {
printf("Unable to open story file '%s'\n","TwoCitiesStory.txt");
return (EXIT_FAILURE);
}
/* count words */
while ((ch = fgetc (file_story)) != EOF) {
if (ch == ' ' || ch == '\n')
num_words++;
}
rewind (file_story);
i = 0;
/* read each line in file */
while (fgets (line, max_word_length, file_story) != NULL)
{
/* tokenize line into words removing punctuation chars in punct */
for (p = strtok (line, punct); p != NULL; p = strtok (NULL, punct))
{
/* convert each char in p to lower-case with tolower */
char *c = p;
for (; *c; c++)
*c = tolower (*c);
/* copy token (word) to story[i] */
strncpy ((char *)story[i], p, strlen (p));
i++;
}
}
/* output array */
for(i = 0; i < num_words; i++)
printf ("story[%d]: %s\n", i, story[i]);
printf("\ntotal words: %d\n\n",num_words);
return (EXIT_SUCCESS);
}
Your num_words takes account of the two extra whitespaces, that's why you get 48.
You should simply print i immediately after the fgets-strtok loop, if I'm not mistaken.
Something along these lines:
while ((ch = fgetc (file_story)) != EOF) {
if (ch == ' ') {
num_words++;
while( (ch = fgetc (file_story)) == ' ' && (ch != EOF) )
}
if (ch == '\n') {
num_words++;
while( (ch = fgetc (file_story)) == '\n' && (ch != EOF) )
}
Though I wonder why you are only taking whitespace and newline characters for counting new words. Two words separated by some other punctuation mark are definitely not accouted for in your code
My suggestion is to change the words counting loop as follows:
/* count words */
num_words = 0;
int flag = 0; // set 1 when word starts and 0 when word ends
while ((ch = fgetc (file_story)) != EOF) {
if ( isalpha(ch) )
{
if( 0 == flag ) // if it is a first letter of word ...
{
num_words++; // ... add to word count
flag = 1; // and set flag to skip not first letters
}
continue;
}
if ( isspace(ch) || ispunct(ch) ) // if word separator ...
{
flag = 0; // ... reset flag
}
}

fgetc read file line by line

So I'm working on a function that will use fgetc to read a line into a buffer. so I can use that buffer as I please, and then refill the buffer with the next line. My function works however I have to repeat code outside of the for loop to process the last line as shown here:
for(i = 0, c = 1; ch != EOF; i++)
{
ch = fgetc(grab);
if(ch == 0x0A)
{
/*Process Line*/
c = 1;
}
else
{
linetmp = realloc(line, (c + 1) * sizeof(char));
if(!linetmp)
{
free(line);
free(url);
printf("\nError! Memory allocation failed!");
return 1;
}
line = linetmp;
line[c - 1] = ch;
line[c] = 0x00;
c++;
}
}
/*repeat if(ch == 0x0A) statement*/
I would rather do this all in the same loop but am not sure on how I would go about doing this. Any help would be greatly appreciated!
I would recommend that you instead use getline() if you're on a POSIX system.
Also, your logic is strange since you check for EOF in the loop header only, but update ch inside the loop. That means it will run through with ch == EOF, before the loop condition is re-evaluated.
You should try putting the updating and the check together, making the loop header read like this:
for(i = 0, c = 1; (ch = fgetc()) != EOF; i++)
Also, you need to think about line separators, both '\n' (carriage return) and '\n' (line feed) can occur.
I don't think you should reallocate after each character. If you want to have the buffer at the smallest value needed, you could reallocate at the end with ( strlen() + 1); Also, there is a function fgets() which reads a line.
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
int somefunc(FILE *grab)
{
int current_size = 100;
int data_size = current_size - 1;
char *url = malloc(current_size);
char *line = malloc(current_size);
char *linetmp;
int ch;
ch = fgetc(grab);
int i = 0;
int c = 0;
while (ch != EOF && ch != 0x0A )
{
i++;
if ( i > data_size )
{
current_size = current_size * 2;
data_size = current_size - 1;
linetmp = realloc(line, current_size);
if (!linetmp)
{
free(line);
free(url);
printf("\nError! Memory allocation failed!");
return 1;
}
line = linetmp;
}
line[c] = ch;
c++;
ch = fgetc(grab);
}
line[c] = '\0';
linetmp = realloc(line,strlen(line) + 1);
line = linetmp;
printf("we just read line->%s\n",line);
free(line);
free(url);
return 0;
}
int main(void)
{
char *cpFilename = "somefile.txt";
FILE *fp = fopen(cpFilename,"r");
if ( fp == NULL )
{
printf("ERROR: could not open %s\n",cpFilename);
printf("Error code: %d\n",errno);
perror("ERROR:");
return 1;
}
int return_code = somefunc(fp);
while (return_code != EOF && return_code != 1)
{
return_code = somefunc(fp);
}
fclose(fp);
}

Invalid write error from valgrind on this piece of code

Valgrind detect an invalid write of size 1 in this piece of code.
In this, I read a file where the first line is something I don't need, and the following lines define 3 strings and 1 int (total_space) that I need to put in this struct:
typedef struct
{
char username[40];
char password[40];
char token[40];
pid_t logged_pid;
int total_space;
int used_space;
} User;
The file is this (each word on a new line, sorry but I still didn't understand how to format text and code):
pass
username1
password1
token1delczzzzozoc
4500000
username2
pasword2222
token2efwerfg
trg
1000000
Here is the code: valgrind yells only in the first 4 lines! And in the first one on the character "e": what's wrong with it?
User *user = NULL;
int n = 0;
int k = 0;
char input;
FILE *file;
if(!(file = fopen(USERS, "r"))) logMmboxd("opening USERS failed\n", 1);
else logMmboxd("opened USERS\n", 0);
/* file pointer at the second line, since the first has nothing i need now */
while((input = fgetc(file)) != EOF && input != '\n') {}
/* read 4 lines every loop from the second line to the EOF */
while((input = fgetc(file)) != EOF)
{
/* rewind the pointer to the previous character (the one I read to see if the file ended) */
if(fseek(file, -1, SEEK_CUR) == -1) logMmboxd("failed seeking USERS\n", 1);
/* expand the array of 1 user */
users = realloc(users, n+1);
n++;
for(k=0; (input=fgetc(file)) != '\n' && input != EOF; k++) users[n-1].username[k] = input;
users[n-1].username[k+1] = '\0';
for(k=0; (input=fgetc(file)) != '\n' && input != EOF; k++) users[n-1].password[k] = input;
users[n-1].password[k+1] = '\0';
for(k=0; (input=fgetc(file)) != '\n' && input != EOF; k++) users[n-1].token[k] = input;
users[n-1].token[k+1] = '\0';
users[n-1].logged_pid = 0;
for(k=0; (input=fgetc(file)) != '\n' && input != EOF; k++) line[k] = input;
line[k+1] = '\0';
users[n-1].total_space = atoi(line);
users[n-1].used_space = usedSpace(users[n-1].username);
}
This code:
/* expand the array of 1 user */
users = realloc(users, n+1);
Expands users by one byte, not one User.
char input;
should be:
int input;
otherwise EOF may not be detected correctly. And whenever I see a call to realloc() I always shudder.

Resources