Reading a file in C, theory questions - c

I was given an assignment in C language about reading and writing in a file.
I have read different codes on different websites and also their explanations. but there is one question that remained unanswered! Following is the general code I found on different sites:
#include <stdio.h>
void main()
{
FILE *fp;
int c;
fp = fopen("E:\\maham work\\CAA\\TENLINES.TXT","r");
c = getc(fp) ;
while (c!= EOF)
{
putchar(c);
c = getc(fp);
}
fclose(fp);
}
My questions are simple and stright.
in line c = getc(fp) ; what is that C receives?? address? character? ASCII Code?? what?
and
while (c!= EOF)
{
putchar(c);
c = getc(fp);
}
here how is c able to read the file character by character?? there is no increment operator... does the pointer "fp" helps in reading the code??
lastly, why is putchar(c); used in printing ? why not use printf("%C", c); ?

getc() returns the integer value of the byte at the current position in the file handle, then advances that position by one byte.
putchar() is simpler than printf.

1 minute googling got me this.
C++ reference
tutorial points
wikipedia
Quoting reference documentation (C++ here, but probably very similar in C).
int getc ( FILE * stream );
Get character from stream Returns the character currently pointed by the internal file position indicator of the specified stream. The internal file position indicator is then advanced to the next character.
If the stream is at the end-of-file when called, the function returns EOF and sets the end-of-file indicator for the stream (feof).
If a read error occurs, the function returns EOF and sets the error indicator for the stream (ferror).
getc and fgetc are equivalent, except that getc may be implemented as a macro in some libraries. See getchar for a similar function that reads directly from stdin.
Further reading gives us:
On success, the character read is returned (promoted to an int value).
The return type is int to accommodate for the special value EOF, which indicates failure: If the position indicator was at the end-of-file, the function returns EOF and sets the eof indicator (feof) of stream.
If some other reading error happens, the function also returns EOF, but sets its error indicator (ferror) instead.
Here we read
This function returns the character read as an unsigned char cast to an int or EOF on end of file or error.
And on wikipedia
Reads a byte/wchar_t from a file stream

Related

How does Visual Studio Code handle the fputs statement for File I/O in "r+" mode?

I have written a code
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fp;
fp=fopen("lets.txt","r+");
if(fp==NULL)
{
printf("ERROR");
exit(1);
}
else
{
char ch,ch1;
while(!feof(fp))
{
ch= fgetc(fp);
printf("%c",ch);
}
printf("\n\nYou want to write something? (1/0)");
int n;
scanf("%d",&n);
if(n==1)
{
fputs("Jenny",fp);
ch1 = fgetc(fp);
printf("%c\n", ch1);
while(ch1 != EOF)
{
ch1=fgetc(fp);
printf("%c",ch1);
}
fclose(fp);
}
else{
printf("File Closed ");
fclose(fp);
}
}
}
I have tried to insert a string inside an already existing file "lets.txt"
but when I run this code, this is shown in the Terminal
I was expecting this to just put Jenny into the final file but it's also adding other text which was present before it and lots of NULL.
Is this because of something like temporary memory storage or something like that or just some mistake in the code?
First of all, the lines
char ch,ch1;
while(!feof(fp))
{
ch= fgetc(fp);
printf("%c",ch);
}
are wrong.
If you want ch to be guaranteed to be able to represent the value EOF and also want to be able to distinguish it from every possible character code, then you must store the return value of fgetc in an int, not a char. Please note that fgetc returns an int, not a char. See this other answer for more information on this issue.
Also, the function feof will only return a non-zero value (i.e. true) if a previous read operation has already failed due to end-of-file. It does not provide any indication of whether the next read operation will fail. This means that if fgetc returns EOF, you will print that value as if fgetc were successful, which is wrong. See the following question for further information on this issue:
Why is “while( !feof(file) )” always wrong?
For the reasons stated above, I suggest that you change these lines to the following:
int ch, ch1;
while ( ( ch = fgetc(fp) ) != EOF )
{
printf( "%c", ch );
}
Another issue is that when a file is opened in update mode (i.e. it is opened with a + in the mode string, for example "r+" as you are doing), you cannot freely change between reading and writing. According to §7.21.5.3 ¶7 of the ISO C11 standard,
output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and
input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file.
If you break any of these rules, than your program will be invoking undefined behavior, which means that anything can happen, which includes the possibility that you get invalid output.
For this reason, I suggest that you change the lines
fputs("Jenny",fp);
ch1 = fgetc(fp);
to:
fseek( fp, 0, SEEK_CUR );
fputs("Jenny",fp);
fflush( fp );
ch1 = fgetc(fp);
In contrast to the line fflush( fp );, which is absolutely necessary, the line fseek( fp, 0, SEEK_CUR ); actually isn't necessary according to the rules stated above, because you encountered end-of-file. But it probably is a good idea to keep that line anyway, for example in case you later change your program to stop reading for some other reason besides end-of-file. In that case, that line would be required.
Re: "I changed the condition for the while loop to this simple form-
ch = fgetc(fp); while(ch != EOF) But it is still showing the same
result.
The value returned by getchar() must be stored in an int:
ch= fgetc(fp);
ch has been declared as a char. Storing the value in a char makes testing for EOF unreliable. C17 states that EOF has a negative int value. On some implementations, char is unsigned, hence it can't represent negative values.
On implementations where the type char is signed, assuming EOF is defined as -1 (which is the case on most implementations), it's impossible to distinguish EOF from the character code 255 (which would be stored as the value -1 in a char, but as 255 in an int).
From the man page:
fgetc(), getc(), and getchar() return the character read as an
unsigned char cast to an int or EOF on end of file or error.
It further states:
If the integer value returned by getchar() is stored into a
variable of type char and then compared against the integer
constant EOF, the comparison may never succeed, because sign-
extension of a variable of type char on widening to integer is
implementation-defined.
which is relevant to fgetc as well.
Possible fix:
Declare ch as an int.
You haven't set the file pointer when switching between read and write. The MSVC man page says about fopen
However, when you switch from reading to writing, the input operation must encounter an EOF marker. If there's no EOF, you must use an intervening call to a file positioning function. The file positioning functions are fsetpos, fseek, and rewind. When you switch from writing to reading, you must use an intervening call to either fflush or to a file positioning function.

What are all the reasons `fgetc()` might return `EOF`?

Certainly fgetc() returns EOF when end-of-file or an input error occurs.
Is that all and does that mean no more data is available?
FILE *inf = ...;
int ch;
while ((ch = fgetc(inf)) != EOF) {
;
}
if (feof(inf)) puts("End-of-file");
else if (ferror(inf)) puts("Error");
else puts("???");
Is testing with feof(), ferror() sufficient?
Note: EOF here is a macro that evaluates to some negative int, often -1. It is not a synonym for end-of-file.
I have found some questions and more that are close to this issue, yet none that enumerate all possibilities.
Is that all and does that mean no more data available?
No, there are more ways for EOF.
An EOF does not certainly mean no more data - it depends.
The C library lists three cases where fgetc() returns EOF.
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. C17dr § 7.21.7.1 3
Recall each stream, like stdin, has an end-of-file indicator and error indicator.
stream just encountered the end-of-file
(Most common) An attempt has been made to get more data, but there was none.
end-of-file indicator for the stream is set
The stream first examines its end-of-file indicator. If it sees that the indicator is set, it returns EOF. No attempt is made to see if more data exists. Some types of streams will report EOF, but data will have arrived after the prior EOF report. Until the end-of-file indicator is cleared as with clearerr(), the return remains EOF. Example 1. Example 2.
Input error
The stream error indicator is not examined. Yet the function failed for some reason to read data other than end-of-file. A common example is fputc(stdin). Often input errors are persistent. Some are not. More data may be available. The common strategy is to end the input.
// Example where ferror() is true, yet fgetc() does not return EOF
FILE *inf = stdin;
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
printf("fputc():%d\n", fputc('?', inf)); // EOF reported
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
printf("fgetc():%d\n", fgetc(inf)); // User typed in `A`, 'A' reported
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
Output
end-of-file:0 error:0
fputc():-1
end-of-file:0 error:1
fgetc():65
end-of-file:0 error:1
When ferror() is true, it does not mean the error just occurred, just sometime in the past.
Other cases
Apparent EOF due to improperly saving as char
fgetc() returns an int with a value in the unsigned char range and EOF - a negative value.
When fgetc() reads character code 255, yet saves that as a char on a system where char is signed, that commonly results in the char having the same value as EOF, yet end-of-file did not occur.
FILE *f = fopen("t", "w");
fputc(EOF & 255, f);
fclose(f);
f = fopen("t", "r");
char ch = fgetc(f); // Should be int ch
printf ("%d %d\n", ch == EOF, ch);
printf("end-of-file:%d error:%d\n", feof(f), ferror(f));
fclose(f);
Output
1 -1 // ch == EOF !
end-of-file:0 error:0
Systems where UCHAR_MAX == UINT_MAX. Rare.
(I have only come across this in some older graphics processors, still something C allows.) In that case, fgetc() may read an unsigned char outside the int range and so convert it to EOF on the function return. Thus fgetc() is returning a character code that happens to equal EOF. This is mostly an oddity in the C history. A way to mostly handle is:
while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
;
}
Such pedantic code is rarely needed.
Undefined behavior
Of course when UB occurs, anything is possible.
FILE * f = fopen("Some_non_existent_file", "r");
// Should have tested f == NULL here
printf("%d\n", fgetc(f) == EOF); // Result may be 1
A robust way to handle the return from fgetc().
FILE *inf = ...;
if (inf) { // Add test
int ch; // USE int !
// Pedantic considerations, usually can be ignored
#if UCHAR_MAX > INT_MAX
clearerr(inf); // Clear history of prior flags
while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
;
}
#else
while ((ch = fgetc(inf)) != EOF) {
;
}
#endif
if (feof(inf)) puts("End-of-file");
else puts("Error");
If code needs to look for data after end-of-file or error, call clearerr() and repeat the if() block.
Another case where EOF doesn't necessarily mean 'no more data' was (rather than 'is') reading magnetic tapes. You could have multiple files on a single tape, with the end of each marked with EOF. When you encountered EOF, you used clearerr(fp) to reset the EOF and error states on the file stream, and you could then continue reading the next file on the tape. However, magnetic tapes have (for the most part) gone the way of the dodo, so this barely counts any more.
Here's one obscure reason:
On Windows, reading byte 0x1A in text mode causes EOF.
By "Windows" I mean both MSVC and MinGW (so it's probably a quirk of Microsoft's CRT). This doesn't happen on Cygwin.

*Might* an unsigned char be equal to EOF? [duplicate]

This question already has answers here:
Can sizeof(int) ever be 1 on a hosted implementation?
(8 answers)
Closed 7 years ago.
When using fgetc to read the next character of a stream, you usually check that the end-of-file was not attained by
if ((c = fgetc (stream)) != EOF)
where c is of int type. Then, either the end-of-file has been attained and the condition will fail, or c shall be an unsigned char converted to int, which is expected to be different from EOF —for EOF is ensured to be negative. Fine... apparently.
But there is a small problem... Usually the char type has no more than 8 bits, while int must have at least 16 bits, so every unsigned char will be representable as an int. Nevertheless, in the case char would have 16 or 32 bits (I know, this is never the case in practice...), there is no reason why one could not have sizeof(int) == 1, so that it would be (theoretically!) possible that fgetc (stream) returns EOF (or another negative value) but that end-of-file has not been attained...
Am I mistaken? Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained? (If yes, I could not find it!). Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?...
EDIT: Indeed, this was a duplicate of Question #3860943. I did not find that question at first search. Thank for your help! :-)
You asked:
Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained?
On the contrary, the standard explicitly allows EOF to be returned when an error occurs.
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
In the footnotes, I see:
An end-of-file and a read error can be distinguished by use of the feof and ferror functions.
You also asked:
Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?
On the theoretical platform where CHAR_BIT is more than 8 and sizeof(int) == 1, that won't be a valid way to check that end-of-file has been reached. For that, you'll have to resort to feof and ferror.
c = fgetc (stream);
if ( !feof(stream) && !ferror(stream) )
{
// Got valid input in c.
}
I think you need to rely on stream error.
ch = fgetc(stream);
if (ferror(stream) && (ch == EOF)) /* end of file */;
From the standard
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
Edit for better version
ch = fgetc(stream);
if (ch == EOF) {
if (ferror(stream)) /* error reading */;
else if (feof(stream)) /* end of file */;
else /* read valid character with value equal to EOF */;
}
If you are reading a stream that is standard ASCII only, there's no risk of receiving the char equivalent to EOF before the real end-of-file, because valid ASCII char codes go up to 127 only. But it could happen when reading a binary file. The byte would need to be 255(unsigned) to correspond to a -1 signed char, and nothing prevents it from appearing in a binary file.
But about your specific question (if there's something in the standard), not exactly... but notice that fgetc promotes the character as an unsigned char, so it won't ever be negative in this case anyway. The only risk would be if you had explicitly or implicitly cast down the return value to signed char (for instance, if your c variable were signed char).
NOTE: as #Ulfalizer mentioned in the comments, there's one rare case in which you may need to worry: if sizeof(int)==1, and you're reading a file that contains non-ascii characters, then you may get a -1 return value that is not the real EOF. Notice that environments in which this happens are quite rare (to my knowledge, compilers for low-end 8-bit microcontrollers, like the 8051). In such a case, the safe option would be to test feof() as #pmg suggested.
I agree with your reading.
C Standard says (C11, 7.21.7.1 The fgetc function p3):
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function
returns EOF.
There is nothing in the Standard (assuming UCHAR_MAX > INT_MAX) that disallows fgetc in a hosted implementation to return a value equal to EOF that is neither an end-of-file nor an error condition indicator.

fgetc returns an unknown character

I have the following code:
FILE *f = fopen('/path/to/some/file', 'rb');
char c;
while((c = fgetc(f)) != EOF)
{
printf("next char: '%c', '%d'", c, c);
}
For some reason, when printing out the characters, at the end of the file, an un-renderable character gets printed out, along with the ASCII ordinal -1.
next char: '?', '-1'
What character is this supposed to be? I know it's not EOF because there's a check for that, and quickly after the character is printed, the program SEGFAULT.
The trouble is that fgetc() and its relatives return an int, not a char:
If the end-of-file indicator for the input stream pointed to by stream is not set and a
next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the
stream (if defined).
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-
file indicator for the stream is set and the fgetc function returns EOF.
It has to return every possible valid character value and a distinct value, EOF (which is negative, and usually but not necessarily -1).
When you read the value into a char instead of an int, one of two undesirable things happens:
If plain char is unsigned, then you never get a value equal to EOF, so the loop never terminates.
If plain char is signed, then you can mistake a legitimate character, 0xFF (often ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is treated the same as EOF, so you detect EOF prematurely.
Either way, it is not good.
The Fix
The fix is to use int c; instead of char c;.
Incidentally, the fopen() call should not compile:
FILE *f = fopen('/path/to/some/file', 'rb');
should be:
FILE *f = fopen("/path/to/some/file", "rb");
Always check the result of fopen(); of all the I/O functions, it is more prone to failure than almost any other (not through its own fault, but because the user or programmer makes a mistake with the file name).
This is the culprit:
char c;
Please change it to:
int c;
The return type of fgetc is int, not char. You get strange behavior when you convert int to char in some platforms.

Reading general file

I'm making a program that reads in a file from stdin, does something to it and sends it to stdout.
As it stands, I have a line in my program:
while((c = getchar()) != EOF){
where c is an int.
However the problem is I want to use this program on ELF executables. And it appears that there must be the byte that represents EOF for ascii files inside the executable, which results in it being truncated (correct me if I'm wrong here - this is just my hypothesis).
What is an effective general way to go about doing this? I could dig up documents on the ELF format and then just check for whatever comes at the end. That would be useful, but I think it would be better if I could still apply this program to any kind of file.
You'll be fine - the EOF constant doesn't contain a valid ASCII value (it's typically -1).
For example, below is an excerpt from stdio.h on my system:
/* End of file character.
Some things throughout the library rely on this being -1. */
#ifndef EOF
# define EOF (-1)
#endif
You might want to go a bit lower level and use the system functions like open(), close() and read(), this way you can do what you like with the input as it will get stored in your own buffer.
You are doing it correctly.
EOF is not a character. There is no way c will have EOF to represent any byte in the stream. If / when c indeed contains EOF, that particular value did not originate from the file itself, but from the underlying library / OS. EOF is a signal that something went wrong.
Make sure c is an int though
Oh ... and you might want to read from a stream under your control. In the absence of code to do otherwise, stdin is subject to "text translation" which might not be desirable when reading binary data.
FILE *mystream = fopen(filename, "rb");
if (mystream) {
/* use fgetc() instead of getchar() */
while((c = fgetc(mystream)) != EOF) {
/* ... */
}
fclose(mystream);
} else {
/* error */
}
From the getchar(3) man page:
Character values are returned as an
unsigned char converted to an int.
This means, a character value read via getchar, can never be equal to an signed integer of -1. This little program explains it:
int main(void)
{
int a;
unsigned char c = EOF;
a = (int)c;
//output: 000000ff - 000000ff - ffffffff
printf("%08x - %08x - %08x\n", a, c, -1);
return 0;
}

Resources