Detecting EOF of a txt File in C - c

I wrote this code which reads every char of my text and puts it into my char array. My Problem is that the end of the file is not detected and so the fscanf() returns after the end of the text every time the last char until my array is filled. How can I prevent that? I am programming in C.
My Code:
int main() {
char array[50][50];
char buff;
FILE *cola = fopen("C:/Users/danie/Desktop/cola.txt", "r");
for (int i = 0; i < 50; i++) {
for (int k = 0; k < 50; k++) {
fscanf(cola, "%c", &buff);
array[i][k] = buff;
}
}
fclose(cola);
for (int i = 0; i < 50; i++) {
for (int k = 0; k < 50; k++) {
printf("%c", array[i][k]);
}
}
return 0;
}
Thank you for your help.

fscanf() returns the number of successful conversions. You should test the return value and also handle newline characters specifically:
#include <stdio.h>
int main(void) {
char array[50][50];
char buff;
FILE *cola = fopen("C:/Users/danie/Desktop/cola.txt", "r");
if (cola == NULL) {
return 1;
}
for (int i = 0; i < 50; i++) {
for (int k = 0; k < 50; k++) {
if (fscanf(cola, "%c", &buff) != 1 || buff == '\n') {
array[i][k] = '\0';
break;
}
array[i][k] = buff;
}
}
fclose(cola);
for (int i = 0; i < 50; i++) {
for (int k = 0; k < 50 && array[i][k] != '\0'; k++) {
printf("%c", array[i][k]);
}
printf("\n");
}
return 0;
}
The code can be simplified if you use getc() instead of fscanf() to read bytes from the file:
#include <stdio.h>
int main(void) {
char array[50][51];
int c, i, k, n;
FILE *cola = fopen("C:/Users/danie/Desktop/cola.txt", "r");
if (cola == NULL) {
return 1;
}
for (n = 0; n < 50; n++) {
for (k = 0; k < 50; k++) {
if ((c = getc(cola)) == EOF || c == '\n') {
break;
}
array[n][k] = c;
}
array[n][k] = '\0';
if (c == EOF && k == 0)
break;
}
fclose(cola);
for (i = 0; i < n; i++) {
puts(array[i]);
}
return 0;
}

Replace:
for (int i = 0; i < 50; i++) {
for (int k = 0; k < 50; k++) {
fscanf(cola, "%c", &buff);
array[i][k] = buff;
}
}
with:
for (int i = 0; i < 50; i++) {
for (int k = 0; k < 50; k++) {
int c = getc(cola);
if (c == EOF)
break;
array[i][k] = c;
}
}
Since buff is then unused, don't define it. Note that the return type of getc() is an int, not just a char. Always check the I/O function for success/failure. In your original code, you don't even check whether the I/O operation succeeds, which makes detecting EOF impossible.
Note that this code makes a number of assumptions that may or may not be justifiable. For example, you assume each line in the file consists of 49 characters plus a newline; you also assume you'll never need to print the information as a 'string' (your existing code does not; it prints character by character, so it is 'safe').
You might want to describe the input as:
Read up to 50 lines with up to 49 characters plus a newline in each line, storing the result in the variable array with each line being a null-terminated string.
This is more resilient to common problems (short lines, long lines, not enough lines). The code for that might be:
enum { LINE_LEN = 50, NUM_LINES = 50 };
char array[NUM_LINES][LINE_LEN];
int i;
for (i = 0; i < LINE_LEN; i++)
{
int c;
int k;
for (k = 0; k < LINE_LEN; k++)
{
c = getc(cola);
if (c == EOF || c == '\n')
break;
if (k == LINE_LEN - 1)
{
/* Too long - gobble excess */
while ((c = getc(cola)) != EOF && c != '\n')
;
break;
}
array[i][k] = c;
}
array[i][k] = '\0';
if (c == EOF)
break;
}
int num_lines = i; // You have num_lines lines of data in your array
I found one version of the Coca Cola™ ASCII art image at https://www.ascii-code.com/ascii-art/logos/coca-cola.php which looks similar to what you have in your images, but there are many other sources and variants:
__ ___ __ .ama ,
,d888a ,d88888888888ba. ,88"I) d
a88']8i a88".8"8) `"8888:88 " _a8'
.d8P' PP .d8P'.8 d) "8:88:baad8P'
,d8P' ,ama, .aa, .ama.g ,mmm d8P' 8 .8' 88):888P'
,d88' d8[ "8..a8"88 ,8I"88[ I88' d88 ]IaI" d8[
a88' dP "bm8mP8'(8'.8I 8[ d88' `" .88
,88I ]8' .d'.8 88' ,8' I[ ,88P ,ama ,ama, d8[ .ama.g
[88' I8, .d' ]8, ,88B ,d8 aI (88',88"8) d8[ "8. 88 ,8I"88[
]88 `888P' `8888" "88P"8m" I88 88[ 8[ dP "bm8m88[.8I 8[
]88, _,,aaaaaa,_ I88 8" 8 ]P' .d' 88 88' ,8' I[
`888a,. ,aadd88888888888bma. )88, ,]I I8, .d' )88a8B ,d8 aI
"888888PP"' `8""""""8 "888PP' `888P' `88P"88P"8m"
This file's longest line is the first at 67 characters plus newline; the shortest is 61 characters plus newline. The file only has 13 lines and 845 characters (LF line endings) in total. Thus, your program is ill-equipped to deal with this particular data file. It looks for 2,500 characters, and won't get them.
My complete test code was rigged to read from standard input, rather than a fixed file name.
#include <stdio.h>
int main(void)
{
FILE *cola = stdin;
enum { LINE_LEN = 80, NUM_LINES = 50 };
char array[NUM_LINES][LINE_LEN];
int i; // Need value of i after loop
for (i = 0; i < NUM_LINES; i++)
{
int c; // Need value of c after loop
int k;
for (k = 0; k < LINE_LEN; k++)
{
c = getc(cola);
if (c == EOF || c == '\n')
break;
if (k == LINE_LEN - 1)
{
/* Too long - gobble excess */
while ((c = getc(cola)) != EOF && c != '\n')
;
break;
}
array[i][k] = c;
}
array[i][k] = '\0';
if (c == EOF)
break;
}
int num_lines = i; // You have num_lines lines of data in your array
for (i = 0; i < num_lines; i++)
puts(array[i]);
return 0;
}
I tested it on the data file shown, with an empty line at the end, and with a couple of lines containing more than 79 characters after the blank line. It handled all those special cases correctly. Note that handling user input is hard; handling perverse user input is harder. The code is less compact. You could change the rules and then change the code to match. I'm not sure this is the most minimal way to code this; it does work, however. It might be better to have a function to handle the inner input loop; the outer loop could test the return value from that function. This would cut down on the special case handling.
#include <assert.h>
#include <limits.h>
#include <stdio.h>
static int read_line(FILE *fp, size_t buflen, char *buffer)
{
assert(buflen < INT_MAX);
int c; // Need value of c after loop
size_t k; // Need value of k after loop
for (k = 0; k < buflen; k++)
{
if ((c = getc(fp)) == EOF || c == '\n')
break;
if (k == buflen - 1)
{
/* Too long - gobble excess */
while ((c = getc(fp)) != EOF && c != '\n')
;
break;
}
buffer[k] = c;
}
buffer[k] = '\0';
return (k == 0 && c == EOF) ? EOF : (int)k;
}
int main(void)
{
enum { LINE_LEN = 80, NUM_LINES = 50 };
char array[NUM_LINES][LINE_LEN];
int i;
for (i = 0; i < NUM_LINES; i++)
{
if (read_line(stdin, LINE_LEN, array[i]) == EOF)
break;
}
int num_lines = i;
for (i = 0; i < num_lines; i++)
puts(array[i]);
return 0;
}
This produces the same output from the same input as the previous version.

int main() {
//char array[50][50];
char buff;
int t;
FILE *cola = fopen("C:/Users/danie/Desktop/cola.txt", "r");
if (cola == NULL)
{
printf("Cannot open file \n");
exit(0);
}
while (1) {
t = fgetc(cola);
if (t == EOF)
break;
buff = t;
printf("%c", buff);
}
fclose(cola);
return 0;
}

Related

Array and file I/O

I have following data in a file:
Name
Surname
#include <stdio.h>
#define FILENAME "file.txt"
#define MAXSIZE 128
int main(void)
{
setvbuf(stdout, NULL, _IONBF, 0);
FILE *file = fopen(FILENAME, "r");
if (!file) {
perror(FILENAME);
return 1;
}
int ch;
size_t i = 0;
char array[3][MAXSIZE];
for(int a=0; a < 3; a++)
{
while (i < MAXSIZE - 1 && ((ch = getc(file)) != EOF))
{
if (ch == '\n')
break;
array[a][i++] = ch;
}
/* null-terminate the array to create a string */
array[a][i] = '\0';
}
fclose(file);
for(int a=0; a < 3; a++)
{
for(int i=0; i < 10; i++)
{
printf("%c", array[a][i]);
}
}
}
When I run this program it gives me output like this "
How can I modify it, so it will not output garbage?
This is a link to my previous post:
Link
As I noted in a comment:
Your printing loop for (int i=0; i < 10; i++) { printf("%c", array[a][i]); } needs to stop when array[a][i] == '\0' —— add if (array[a][i] == '\0') break; before the printf().
You also need to reset i to 0 before the while loop (but after the for loop). If you declared i inside the first for loop, you'd not have the problems you do.
Note that you have two different variables called i (one is size_t i = 0; before the loops; the other is for (int i = 0; …) while printing) and one hides the other. That can lead to confusion.
Those changes might lead to this code:
#include <stdio.h>
#define FILENAME "file.txt"
#define MAXSIZE 128
int main(void)
{
setvbuf(stdout, NULL, _IONBF, 0);
FILE *file = fopen(FILENAME, "r");
if (!file) {
perror(FILENAME);
return 1;
}
char array[3][MAXSIZE];
for (int a = 0; a < 3; a++)
{
int ch;
size_t i = 0;
while (i < MAXSIZE - 1 && ((ch = getc(file)) != EOF))
{
if (ch == '\n')
break;
array[a][i++] = ch;
}
array[a][i] = '\0';
}
fclose(file);
for (int a = 0; a < 3; a++)
{
for (int i = 0; i < 10; i++)
{
if (array[a][i] == '\0')
break;
printf("%c", array[a][i]);
}
}
}
There's also no obvious reason not to print the data using:
for (int a = 0; a < 3; a++)
puts(array[a]);
or
for (int a = 0; a < 3; a++)
printf("%s\n", array[a]);

Get length of char array with null elements in C

Currently I am making a project that uses char arrays that have null elements. I want to be able to get the length of the array, in the sense of the number of elements that aren't null. This seemed reasonably trivial and I made this function:
int getWordLen(char word[]) {
int count = 0;
for (int i = 0; i < 512; i++) {
if (word[i] != '\0') {
count++;
}
}
printf("%d ", count);
return count;
}
However, every char array returns a length of 188. Any help would be appreciated.
This is the function I was calling it from:
void redact(Words * redactWords, char fileName[]) {
FILE * file = fopen(fileName, "r");
FILE * outputFile = fopen("outputFile.txt", "w+");
char word[512];
int i = 0;
char c;
while (c != EOF) {
c = getc(file);
if ((c > 96) && (c < 123)) {
word[i] = c;
i++;
continue;
}
else if ((c > 64) && (c < 91)) {
word[i] = c + 32;
i++;
continue;
}
i = 0;
if (isWordRedactWord(redactWords, word)) {
//write stars to file
char starStr[512];
for (int i = 0; i < getWordLen(word); i++) {
starStr[i] = '*';
}
fputs(starStr, outputFile);
}
else {
//write word to file
fputs(word, outputFile);
}
strcpy(word, emptyWord(word));
}
fclose(file);
fclose(outputFile);
}
In the initial while, I would only use while(!EOF).
Also, I believe you are using a lot more resources than necessary with the implementation of that for inside the while:
char starStr[512];
for (int i = 0; i < getWordLen(word); i++) {
starStr[i] = '*';
I suggest you to put it outside the while loop and see what happens.
If it is always giving you 188 of lenght, it is counting something that's constant, and may be related to that outer loop.
Hope you can solve it!

Uninitialised values in dynamic array in C

I've been given a task that requires a dynamic 2D array in C, but we haven't even covered pointers yet, so I'm kind of at a loss here. I have to read some text input and store it in a 2D array, without limiting its size.
Unfortunately, Valgrind keeps throwing me an error saying that there's an uninitialised value, when the puts() function executes and sometimes it prints out some random signs. I understand that I must have omitted some indexes, but I just can't find where the issue stems from. Additionally, all advices regarding the quality of my code are very much appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <assert.h>
#define MULT 3
#define DIV 2
char **read(int *row, int *col) {
char **input = NULL;
int row_size = 0;
int col_size = 0;
int i = 0;
int c;
while ((c = getchar()) != EOF) {
if (c != '\n') { // skip empty lines
assert(i < INT_MAX);
if (i == row_size) { // if not enough row memory, allocate more
row_size = 1 + row_size * MULT / DIV;
input = realloc(input, row_size * sizeof *input);
assert(input != NULL);
}
char *line = NULL;
int j = 0;
// I need all the rows to be of the same size (see last loop)
line = malloc(col_size * sizeof *line);
// do while, so as to not skip the first character
do {
assert(j < INT_MAX-1);
if (j == col_size) {
col_size = 1 + col_size * MULT / DIV;
line = realloc(line, col_size * sizeof *line);
assert(line != NULL);
}
line[j++] = c;
} while(((c = getchar()) != '\n') && (c != EOF));
// zero-terminate the string
if (j == col_size) {
++col_size;
line = realloc(line, col_size * sizeof *line);
line[j] = '\0';
}
input[i++] = line;
}
}
// Here I give all the lines the same length
for (int j = 0; j < i; ++j)
input[j] = realloc(input[j], col_size * sizeof *(input+j));
*row = i;
*col = col_size;
return input;
}
int main(void) {
int row_size, col_size, i, j;
char **board = read(&row_size, &col_size);
// Initialize the remaining elements of each array
for (i = 0; i < row_size; ++i) {
j = 0;
while (board[i][j] != '\0')
++j;
while (j < col_size-1)
board[i][++j] = ' ';
}
for (i = 0; i < row_size; ++i) {
puts(board[i]);
}
for (i = 0; i < row_size; ++i)
free(board[i]);
free(board);
return 0;
}

Most common character in a file in C

I'm doing my C programming course homework and I need to find a most common character in given file.
My testing with a testfile, emptyfile and other small amount text files works great (or at least I think so), but in the last long testfile something goes wrong and the error message is: "Should have returned 'e' (101) for file rfc791.txt. You returned 'b' (98)".
So what I'm asking that what might be wrong with my code, when suddenly the most common letter is not what is should be?
int most_common_character(char *filename) {
FILE *f;
if ((f = fopen(filename, "r")) == NULL) {
fprintf(stderr, "Not opened: %s\n", strerror(errno));
return -1;
}
char frequency[26];
int ch = fgetc(f);
if (ch == EOF) {
return 0;
}
for (ch = 0; ch < 26; ch++) {
frequency[ch] = 0;
}
while (1) {
ch = fgetc(f);
if (ch == EOF) {
break;
}
if ('a' <= ch && ch <= 'z') {
frequency[ch - 'a']++;
}
else if ('A' <= ch && ch <= 'Z') {
frequency[ch - 'A']++;
}
}
int maxCount = 0;
int maxChar = 0;
for (int i = 0; i <= 26; ++i) {
if (frequency[i] > maxCount) {
maxCount = frequency[i];
maxChar = i;
}
}
fclose(f);
return maxChar + 'a';
}
I would be very grateful if someone has any hints to fix my code :) I've tried to search the solution to this problem from many other related topics but nothing seems to work.
You should use < operator in the second for loop. Because of that when you are checking frequency[i] > maxCount, at frequency[26] it behaves undefined behaviour, meaning the value at that index may be less or higher than the compared value.
Your code do have some problems. However, they are so tiny so the code still works well with small tests.
int ch = fgetc(f); drop the first char in the file
for (int i = 0; i <= 26; ++i) break out of the array 's range (only from 0-->25)
Beside these small mistakes, your code is awesomely fine. Well done #thumbsup
Loop runs out-of-bounds. #Weather Vane
// for (int i = 0; i <= 26; ++i) {
for (int i = 0; i < 26; ++i) {
Code throws away result of the first character. #BLUEPIXY
int ch = fgetc(f);
if (ch == EOF) {
return 0;
}
// This value of ch is not subsequently used.
Other fixes as below
int most_common_character(char *filename) {
...
// Use a more generous count #Weather Vane
// char frequency[26];
// Consider there may be more than 26 different letters
// fgetc return EOF and value in the unsigned char range
int frequency[UCHAR_MAX + 1] = { 0 };
// Not needed as array was initialize above
// for (ch = 0; ch < 26; ch++) { frequency[ch] = 0; }
// BTW correct type declaration of int, avoided rookie mistake of using char
int ch;
// Codes use tolower(), islower() as that is the portable way to
// handle type-of-character detection
while ((ch = fgetc(f)) != EOF) {
frequency[tolower(ch)]++; // could add check to insure frequency[] does not overflow
}
int maxCount = 0;
int maxChar = -1;
for (int i = 0; i <= UCHAR_MAX; ++i) {
if (islower(i) && frequency[i] > maxCount) {
maxCount = frequency[i];
maxChar = i;
}
}
fclose(f);
return maxChar;
}

Sorting strings doesn't work properly

I have rjecnik.txt file that looks like this
mate sime, jure
stipica gujo, prvi
ante mirkec
goran maja, majica
avion kuca, brod, seoce
amerika, neka, zemlja, krcma
brodarica, zgrada, zagreb
zagreb split
zadar rijeka
andaluzija azija
I need to order lines alphabetically (not words) and my program produces this result which is not correct:
andaluzija azijamate sime, jure
amerika, neka, zemlja, krcma
brodarica, zgrada, zagreb
ante mirkec
avion kuca, brod, seoce
goran maja, majica
stipica gujo, prvi
zadar rijeka
zagreb split
Press [Enter] to close the terminal ...
When I use non ascii character like kuća for kuca or krčma for krcma it produces this result (all wrong)
andaluzija azijamate sime, jure
amerika, neka, zemlja, krŔma
brodarica, zgrada, zagreb
ante mirkec
avion kuŠa, brod, seoce
goran maja, majica
stipica gujo, prvi
zadar rijeka
zagreb split
Press [Enter] to close the terminal ...
This is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int ch, nl = 1, min, lenght1, lenght2, lenght;//ch will hold characters, min is for selection sort, lenght holds value of strlen for determine wthat line is longer
FILE * fp;// FILE pointer
char * lines[1000];//that will dynamically hold strings for lines
char * temp;//for lines swaping
if((fp = fopen("C:\\Users\\don\\Documents\\NetBeansProjects\\proba2\\dist\\Debug\\MinGW-Windows\\rjecnik.txt", "r")) == NULL)//I had to temporarily put full path to rjecnik.txt
{
printf("Can't open file...");
exit(1);
}
while((ch = getc(fp)) != EOF)//count lines
{
if(ch == '\n')
nl++;
}
int i, j;
for (i = 0; i < nl; i++)
lines[i] = malloc(1000);//create array of string size value of nl
fseek(fp, 0L, SEEK_SET);//go to start of file
i = 0;
j = 0;
while((ch = getc(fp)) != EOF)//fill arrays of string
{
lines[i][j] = ch;
j++;
if(ch == '\n')
{
j = 0;
i++;
}
}
for(i = 0; i < nl - 1; i++)//selection sort doesn't work properly
{
min = i;//min is i
for(j = i + 1; j < nl; j++)//for number of lines(nl) times
{
lenght1 = strlen(lines[i]);//find what string is longer and lenght is smaller one
lenght2 = strlen(lines[j]);
if(lenght1 < lenght2)
lenght = lenght1;
else
lenght = lenght2;
if(strncmp(lines[i], lines[j], lenght) > 0 )//compare two strings
min = j;//if second string is alphabetically smaller min is j
}
temp = lines[i];// swapping
lines[i] = lines[min];
lines[min] = temp;
}
for(i = 0; i < nl; i++ )//printing to console
{
lenght1 = strlen(lines[i]);
for(j = 0; j < lenght1; j++ )
{
putchar(lines[i][j]);
}
}
return 0;
}
Now program crashes at the end when I add this code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int ch, nl = 1, min, lenght1, lenght2, lenght;//ch will hold characters, min is for selection sort, lenght holds value of strlen for determine wthat line is longer
FILE * fp;// FILE pointer
char * lines[1000];//that will dynamically hold strings for lines
char * temp;//for lines swaping
if((fp = fopen("C:\\Users\\don\\Documents\\NetBeansProjects\\proba2\\dist\\Debug\\MinGW-Windows\\rjecnik.txt", "r")) == NULL)//I had to temporarily put full path to rjecnik.txt
{
printf("Can't open file...");
exit(1);
}
while((ch = getc(fp)) != EOF)//count lines
{
if(ch == '\n')
nl++;
}
int i, j;
for (i = 0; i < nl; i++)
lines[i] = malloc(1000);//create array of string size value of nl
fseek(fp, 0L, SEEK_SET);//go to start of file
i = 0;
j = 0;
while((ch = getc(fp)) != EOF)//fill arrays of string
{
lines[i][j] = ch;
j++;
if(ch == '\n')
{
j = 0;
i++;
}
}
for(i = 0; i < nl - 1; i++)//selection sort doesn't work properly
{
min = i;//min is i
for(j = i + 1; j < nl; j++)//for number of lines(nl) times
{
lenght1 = strlen(lines[i]);//find what string is longer and lenght is smaller one
lenght2 = strlen(lines[j]);
if(lenght1 < lenght2)
lenght = lenght1;
else
lenght = lenght2;
if(strncmp(lines[min], lines[j], lenght ) > 0 )//compare two strings
min = j;//if second string is alphabetically smaller min is j
}
temp = lines[i];// swapping
lines[i] = lines[min];
lines[min] = temp;
}
for(i = 0; i < nl; i++ )//printing to console
{
lenght1 = strlen(lines[i]);
for(j = 0; j < lenght1; j++ )
{
putchar(lines[i][j]);
}
}
for (i = 0; i < 100; i++)//Program crashes here
free(lines[i]);
return 0;
}
1.- You must initialize lines to 0 after malloc so strlen works properly.
2.- Compare lines[j] with lines[min]
3.- Don't forget free lines
You're always comparing lines[j] to lines[i], but you should be comparing it to lines[min].
If this isn't you learning about how to sort and get input, c provides qsort() and fgets(), so you could
int strsort(const void *a, const void *b)
{
char *const*astr=a, *const*bstr=b;
return strcmp(*astr, *bstr);
}
main()
{
FILE*f = fopen(...);
char (*arr)[1000] = malloc(1000*1000);
int x;
for(x=0;x<1000 && fgets(1000, arr[x], f);x++)
arr[x][strlen(arr[x])-2] = '\0'; //strip newlines
qsort(arr, x, 1, strsort);
int i;
for(i=0; i<x; i++)
printf("%s\n", arr[x]);
}
It's much clearer what you're doing this way.
Minor nitpick:
lenght1 = strlen(lines[i]);
lenght2 = strlen(lines[j]);
if(lenght1 < lenght2)
lenght = lenght1;
else
lenght = lenght2;
if(strncmp(lines[i], lines[j], lenght) > 0 )
... ;
You don't need this: strcmp() stops when either of the strings terminates, whichever comes first. In your case, you need to compare one more character (the NUL), like
strncmp( lines[i], lines[j], lenght+1)
, otherwise "apple" and "apples" would compare equal (because only the first five characters would be compared). But the "normal" form:
strcmp(lines[i], lines[j])
does exactly what you want.

Resources