getc(fp) : strange character : ÿ ( at the very bottom ) - c

a new empty file:
touch /file.txt
read. print.
fp = fopen("/file.txt", "r");
char text[1000];
int i=0;
while(!feof(fp)){
text[i++] = getc(fp);
}
text[i]='\0';
printf("%s\n", text);
result:
ÿ
EXTRA INFO : if file.txt had many lines.. it would have appended that strange character at the very bottom of it. so perhaps it is not something that happens on every "while loop".

If you're using ISO 8859-15 or 8859-1 code set, the ÿ (LATIN SMALL LETTER Y WITH DIAERESIS, U+00FF in Unicode) has code 25510 or 0xFF. When you store EOF in the array, it gets converted to ÿ.
Don't store EOF in a char. And remember that getchar() returns an int, not a char. It has to be able to return every value that can be stored in an unsigned char, plus EOF which is negative (usually but not necessarily -1).
And, as noted in the comments, while (!feof(file)) is always wrong. This is just another reason why.
This code is fixed, more or less. It really should report an error if it fails to open the file. Note that it also ensures you don't overflow the buffer.
FILE *fp = fopen("/file.txt", "r");
if (fp != 0)
{
char text[1000];
int i=0;
int c;
while ((c = getc(fp)) != EOF && i < sizeof(text)-1)
text[i++] = c;
text[i]='\0';
printf("%s\n", text);
fclose(fp);
}
See also while ((c = getc(file)) != EOF) loop won't stop executing.

The ÿ is the byte 255 in your codepage, which is the constant EOF coerced into a char. Instead of using feof, you must store the return value of getc into an int, then compare it against EOF, here's an easy-to-read example (notice that you'd have to have bounds-checking too):
while (1) {
int c = getc(fp);
if (c == EOF) {
break;
}
text[i++] = c;
}

Related

Printing char of a file with fgetc() ending with a "?" symbol

I'm trying to print the chars (with fgetc) in a file one by a one with a while loop.
I'm using the latest Atom editor to write the code, and I compile with the GPP Compiler, by pressing F5 and the output is displayed in the xterm terminal.
int main(int argc, char const *argv[])
{
FILE* file = NULL;
file = fopen("text.txt", "r+");
int letter = 0;
if (file != NULL)
{
while(letter != EOF)
{
letter = fgetc(file);
printf("%c", letter);
}
I expected the output to be the text in my file, which it is, but at the end there's a question mark symbol.
What I understood after doing some research is that my fgetc function reads the EOF like a normal character and prints it, resulting in a question mark symbol at the end.
Thanks for your help !
... ending with a “?” symbol
doing
while(letter != EOF)
{
letter = fgetc(file);
printf("%c", letter);
}
you print letter before to check if it is EOF, so you (try to) print EOF which is not a character, producing the unexpected output
Example of a valid code :
while ((letter = fgetc(file)) != EOF)
putchar(letter); /* or printf("%c", letter); if you prefer */
I have been having the same issue and I found out that the return type for fgetc is an integer and it may be returning -1.

using char, cast to an int as array index to store frequency of "chars" in a textfile

Iam writing a simple program to store number of occurrence's of the various symbols in a text file. I am reading from this file using fgetc() and a file pointer. one char at a time. i set up my array outside my method like so
int frequency[MAX_SYMBOLS] = {0};
MAX_SYMBOLS is defined as being 255. I then read over the and try to count every time a particular character appears below is my method set_frequency()
void set_frequency()
{
int count = 0;
char c;
FILE *fp = fopen("file.txt","r");
while((c = fgetc(fp)) != EOF)
{
if(c != ' ' && c != '\n')
{
frequency[(int) c]++;
count++;
}
}
fclose(fp);
}
iam currently getting a segmentation fault for this not entirely sure why?
I think its an issue with the array index. or possibly the size of my file as it is rather large. If anyone can help that would be great as iam not great with c to be honest.
EDIT
the 'c' variable need to be an int not a char as that is what is returned from the fgetc() function. then I wont have to cast in the index value!!
In addition to the fact that EOF could not fit in a char, you have 2 potential problems:
MAX_SYMBOLS is smaller than 255, that are the character you can find using plane ascii.
char is a signed integer. If you read something > 0x7f it will be converted in a negative array index.
Try using an integer for reading to satisfy requirement for EOF. You'll also get guarantee that the code will never be negative, but in the range 0-255.
void set_frequency()
{
int count = 0;
int c;
FILE *fp = fopen("file.txt","r");
while((c = fgetc(fp)) != EOF)
{
if(c != ' ' && c != '\n')
{
frequency[c]++;
count++;
}
}
fclose(fp);
}
If happens that you have to use chars for similar issues use cast to force unsigned values:
frequency[(int)(unsigned char) c]++;

read text from file C string terminator

I want to read a text from file which contains also '\n' character
This is my function
void readFromFile (FILE * fid, unsigned char * mesage) {
unsigned char c;
int mesage_length = 0;
while((c = fgetc(fid)) != EOF) {
mesage[mesage_length] = c;
mesage_length++;
}
}
I have completely absolute no idea why when it gets to the '\n' my program crashes, it enters in an infinite loop and the mesage_length grows up to 13992 or something, but i read only 13 characters like : "Why not working?\n"
How can i read the whole text until EOF and put it into a string (char *) ?
If i put the condition inside the while if(c == '\n') break; it works fine.. but would read until the first \n

Unsure about three simple functions with C programming

/* stringlength
* input: str, pointer to a string
* output: integer representing the length of string str,
* not counting the terminating character.
*
* You may NOT call ANY functions within this function.
*/
int stringlength(char *str)
{
// count the number of characters in str
int count=0,k;
for (k=0; str[k] != '\0';k++)
count++;
return count;
}
/* countchars
* inputs: character c, string str
* output: The number of instances of c in the string str
* You may not call ANY function calls within this function.
*/
int countchars(char c, char *str)
{
// count the number of times c is found in str
int k,count=0;
for (k=0;str[k]=='\0';k++)
{
if (str[k] == c)
count++;
else;
}
return count;
}
/* countlines
* input: char *filename - string containing the filename
* output: integer representing the number of lines in the file
*/
int countlines(char *filename)
{
// count the number of lines in the file called filename
FILE *f = fopen(filename,"r");
char ch;
int lines=0;
f = fopen(filename,"r");
do{
ch = fgetc(f);
if( ch == '\n')
lines++;
}while( ch != EOF );
return lines;
}
I need help with these three different functions that I am implementing in my program. I am a beginner so go easy on me, the countlines function is giving me the most trouble. If anyone could explain why not or why these functions will work, it would be greatly appreciated.
There are a number of problems in countlines():
You open the file twice, but overwrite the first FILE * value with the second, so there's no way you can close it. This is a minor problem.
The major problem is that the function fgetc() returns an int, not a char. In particular, EOF is a value different from every char.
The code does not close the file before returning. Generally, if you open a file in a function, then you should close it. If you don't, you have to pass the file pointer back to the calling code so that it can close it.
The do ... while loop is seldom correct for an input loop (a while loop testing at the top is almost always much cleaner and clearer) but at least you weren't using feof().
int countlines(char *filename)
{
FILE *fp = fopen(filename,"r");
int ch;
int lines = 0;
if (fp == 0)
return lines;
while ((ch = fgetc(fp)) != EOF)
{
if (ch == '\n')
lines++;
}
fclose(fp);
return lines;
}
When you use char instead, one of two things happens:
If your char type is signed, then a real character (often ÿ — y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) also matches EOF so you can stop reading before you reach end of file.
If your char type is unsigned, no value will ever match EOF so the loop will never stop.
In stringlength(), you have two variables count and k that are carefully kept at the same value; you only need one of the two.
Apart from raggedy indentation (endemic in the code shown — and definitely something to be avoided), and the unnecessary and pointless else; which does absolutely nothing, the code for countchars() looks OK (late addition) ... has the condition in the for loop inverted; it should be str[k] != '\0', of course.

Understanding type casting in c

I'm following a book on c, and I come to some code that reads a file with 3 lines of text.
#include <stdio.h>
int main (int argc, const char * argv[]) {
FILE *fp;
int c;
fp = fopen( "../../My Data File", "r" );
if ( NULL == fp ) {
printf( "Error opening ../My Data File" );
} else {
while ( (c = fgetc( fp )) != EOF )
putchar ( c );
fclose( fp );
}
return 0;
}
I tried to modify it, to detect each line and print the current line number by making these modifications.
int line = 1;
while ( (c = fgetc( fp )) != EOF ){
if (c == '\n'){
printf(" LINE %d", line);
putchar( c );
line++;
}
else {
putchar ( c );
}
}
But it failed to print the line #, till I changed the type of the variable c to a char. Is there a way to check for a newline while still using c as an int?
What is the proper way to check for a newline?
Your code prints line numbers at the end of a line, right before printing the '\n', because of the way you have written the loop. Otherwise, your code should work.
If you want your code to print the line numbers at the beginning, you can do something like (untested):
int line_num_printed = 0; /* indicating if we printed a line number */
int line = 1;
while ((c = fgetc(fp)) != EOF) {
if (!line_num_printed) {
printf("LINE %d: ", line);
line_num_printed = 1;
}
putchar(c);
if (c == '\n'){
line++;
line_num_printed = 0;
}
}
If there is something else that "doesn't work", you should post complete code and tell us what doesn't work.
Edit: The proper way to check a character for a newline is to test against '\n'. If the character came from a file, you should also make sure you open the file in text mode, i.e., without a b in the second argument to fopen().
Also, you want c to be of type int, not char. This is because in C, EOF is a small negative number, and if char is unsigned, comparing it against a negative number convert the value of EOF to a positive value (equal to EOF + UCHAR_MAX + 1 most likely). Therefore, you should not change c to char type. If you do, the comparison c != EOF might be false even when fgetc() returns EOF.
Usually the integer code for a new line ('\n') is 13. So you could do ( if (c == 13) ), but also note that windows files use 2 characters to define a new line '\r' and '\n'. The integer character for '\r' is 10. But basically yes you can keep 'c' as an int and just compare against 13.

Resources