fgets implementation from the std library of K&R 2nd edition - c

I'm aware that code snippets from textbooks are just for demonstration purposes and shouldn't be held to production standards but K&R 2nd edition on page 164-165 says:
fgets and fputs, here they are, copied from the standard library
on our system:
char *fgets(char *s, int n, FILE *iop)
{
register int c;
register char *cs;
cs = s;
while (--n > 0 && (c = getc(iop)) != EOF)
if ((*cs++ = c) == '\n')
break;
*cs = '\0';
return (c == EOF && cs == s) ? NULL : s;
}
Why is the return statement not return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s; since:
ANSI C89 standard says:
If a read error occurs during the operation, the array contents are
indeterminate and a null pointer is returned.
Even the standard library illustrated in Appendix B of the book says so. From Page 247:
fgets returns s, or NULL if end of file or error occurs.
K&R uses ferror(iop) in fputs implementation given just below this fgets implementation on the same page.
With the above implementation, fgets will return s even if there is a read error after reading some characters. Maybe this is an oversight or am I missing something?

You are correct that the behavior of the posted implementation of the function fgets does not comply with the C89 standard. For the same reasons, it also does not comply with the modern C11/C18 standard.
The posted implementation of fgets handles end-of-file correctly, by only returning NULL if not a single character has been read. However, it does not handle a stream error correctly. According to your quote of the C89 standard (which is identical to the C11/C18 standard in this respect), the function should always return NULL if an error occurred, regardless of the number of characters read. Therefore, the posted implementation of fgets is wrong to handle an end-of-file in exactly the same way as a stream error.
It is worth noting that the second edition of K&R is from 1988, which is from before the ANSI C89 standard was published. Therefore, the exact wording of the standard may not have been finalized yet, when the second edition of the book was written.
The posted implementation of fgets also does not comply with the quoted specification of Appendix B. Assuming that the function fgets is supposed to behave as specified in Appendix B, then the posted implementation of fgets handles errors correctly, but it does not handle end-of-file correctly. According to the quote from Appendix B, the function should always return NULL when an end-of-file occurs (even if characters have been successfully read, which is not meaningful).
It is also worth noting that using the statement
return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s;
as suggested in the question will not make the implementation of the function fgets fully comply with the C89/C11/C18 standards. When a stream error occurs "during the operation", the function is supposed to return NULL. However, when ferror returns nonzero, it may be impossible to tell whether the error occurred "during the operation", i.e. whether the stream's error indicator was already set before the function fgets was called. It is possible that the stream's error indicator was already set due to an error that occurred before fgets was called, but that all subsequent stream operations succeeded or failed due to end-of-file (i.e. not due to stream error). The function fgets is also not allowed to simply call clearerr at the start of the function in order to distinguish these cases, because it would then have to restore the state of the stream's error indicator before returning. Setting the stream's error indicator is not possible in the C standard library; it would require an implementation-specific function. Looking at the return value of getc will not always be able to resolve this ambiguity, because a return value of EOF can mean both end-of-file or error.

Why is the return statement not return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s;
Because this is not the best option either.
return (c == EOF && cs == s) ? NULL : s; well handles the case of EOF due to end-of-file, but not EOF due to input error.
ferror(iop) is true when an input error just occurred or when an input error had occurred before. It is possible that fgetc() can return non-EOF after returning EOF due to input error. This differs from feof(), which is sticky. Once EOF occurs due to end-of-file, feof() continues to return EOF, unless the end-of-file flag is cleared.
A better alternative would be to insure an EOF just occurred rather than use an unqualified ferror().
if (c == EOF) {
if (feof(iop)) {
if (cs == s) return NULL;
} else {
return NULL; // Input error just occurred.
}
}
return s;
Pedantic n
Pathological cases:
The below suffers when n <= 0 as *cs = '\0' writes into cs[] outside its legal range. --n is a problem when n == INT_MIN.
while (--n > 0 && (c = getc(iop)) != EOF) // Is --n OK?
if ((*cs++ = c) == '\n')
break;
*cs = '\0'; // Is n big enough
// Alternative
while (n > 1 && (c = getc(iop)) != EOF) {
n--;
*cs++ = c;
if (c == '\n') {
break;
}
}
if (n > 0) *cs = '\0';
Pedantic CHAR_MAX >= INT_MAX
To note: On rare machines (some old graphics processors), returning an EOF may be a valid CHAR_MAX. Alternative code not presented.
Uninitialized c
Testing c == EOF is UB as it is not certain c was ever set with a small n.
Better to initialize C: register int c = 0;
More:
What are all the reasons fgetc() might return EOF?.
Is fgets() returning NULL with a short buffer compliant?.

Related

Why `gets_s()` still isn't implemented in GCC (9.3.0)?

I know fgets() is a more common and widespread option for string input, but C11 has been around for 9 years. Why is gets_s() still out of work?
Even when I add -std=c11, it still doesn't work, even though gets_s() should be in stdio.h.
Because it's optional. And the persons behind gcc seems to think it is a bad idea to include it. I don't know how they reasoned, but hints can be found in the C standard:
Recommended practice
The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.
https://port70.net/~nsz/c/c11/n1570.html#K.3.5.4.1
If you want to use gets_s, then use another compiler. Or write your own wrapper, but don't call it gets_s because it's quite tricky to get it completely identical to the specs.
The C standard says this:
Runtime-constraints
s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.
If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.
Description
The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.
If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.
There is one thing here that does not make sense at all. A runtime constraint is that s should not be a null pointer. On runtime constraint violoations, s[0] should be set to zero. But the operation s[0] = '\0' has undefined behavior if s is a null pointer.
Here is my take on trying to implement it, but IMO the specs are a mess, and I would not trust this. It was tricky to get it right.
char *my_gets_s(char *s, size_t n)
{
if(!s) return NULL;
size_t i=0;
int ch;
for(i=0; i<n-1; i++) {
ch = fgetc(stdin);
// If end-of-file is encountered and no characters have been read into the array,
// or if a read error occurs during the operation, then s[0] is set to the null character
if(ferror(stdin) || (ch == EOF && i == 0)) {
s[0] = '\0';
return NULL;
}
// If EOF and we have read at least one character
if(ch == EOF) {
s[0] = '\0';
return s;
}
s[i] = ch;
if(ch == '\n') {
s[i] = '\0';
return s;
}
}
while ((ch = getchar()) != '\n' && ch != EOF);
s[0] = '\0';
return NULL;
}
As others have pointed, gets_s() is:
optional (and many compilers actually don't implement it)
since C11 (so previous standards definitely don't have it)
If you really need to have something instead of fgets(), then you can implement wrapper yourself, e.g.:
char* myGets(char* str, int count)
{
if (fgets(str, count, stdin)) {
for (int i = 0; i < count; ++i) {
if (str[i] == '\n') {
str[i] = '\0';
break;
}
}
return str;
} else {
return NULL;
}
}
While it would be useful to have an alternative to fgets() which will always read an entire line, discarding excess information if need be, and report how many characters were read, gets_s is not such a function. The gets_s function would only be appropriate in scenarios in which any over-length input lines should be completely discarded. The only good ways of performing line-based I/O are either to build one's own line-input routine based upon fgetc() or getchar(), use fgets() with corner-case logic that's as big as a character-based get-line routine, or--if one wants to maximize performance and the stream doesn't have to be shared with anything else--use fread() and memchr(), persisting read data in a private buffer between calls to the get-line routine.

The strange behavior of 'read' system function

The programm I tried writing should have been able to read a string of a length not longer than 8 characters and check if such string were present in the file. I decided to use 'read' system function for it, but I've come up with a strange behavior of this function. As it's written in manual, it must return 0 when the end of file is reached, but in my case when there were no more characters to read it still read a '\n' and returned 1 (number of bytes read) (I've checked the ASCII code of the read character and it's actually 10 which is of '\n'). So considering this fact I changed my code and it worked, but I still can't understand why does it behave in this way. Here is the code of my function:
int is_present(int fd, char *string)
{
int i;
char ch, buf[9];
if (!read(fd, &ch, 1)) //file is empty
return 0;
while (1) {
i = 0;
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
buf[i] = '\0';
if (!strncmp(string, buf, strlen(buf))) {
close(fd);
return 1;
}
if(!read(fd, &ch, 1)) //EOF reached
break;
}
close(fd);
return 0;
}
I think that your problem is in the inner read() call. There you are not checking the return of the function.
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
If the file happens to be at EOF when entering the function and ch equals '\n' then it will be an infinite loop, because read() will not modify the ch value. BTW, you are not checking the bounds of buf.
I'm assuming the question is 'why does read() work this way' and not 'what is wrong with my program?'.
This is not an error. From the manual page:
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
If you think about it read must work this way. If it returned 0 to indicate an end of file was reached when some data had been read, you would have no idea how much data had been read. Therefore read returns 0 only when no data is read because of an end-of-file condition.
Therefore in this case, where there is only a \n available, read() will succeed and return 1. The next read will return a zero to indicate end of file.
The read() function unless it finds a EOF keeps reading characters and places it on the buffer. here in this case \n is also considered as a character. hence it reads that also. Your code would have closed after it read the \n as there was nothing else other than EOF . So only EOF is the delimiter for the read() and every other character is considered normal. Cheers!

debug assertion failed. C

trying to get the number of lines in a text file. and i get this error that i havent seen before.
the error says debug assertion failed. expression c>=-1 && c <=255
void get_lines(FILE* fp, int* plines){
int i=0;
char c;
int number_of_conversions;
number_of_conversions = fscanf(fp, "%c", &c);
while (number_of_conversions != EOF && number_of_conversions != 0 ){
number_of_conversions = fscanf(fp, "%c", &c);
if (c == '\n' ){
i++;
}
}
*plines = i;
}
The code you presented does not correspond to the error message you presented. The error message is related to an assertion somewhere else in your source code, having this form:
assert(c>=-1 && c <=255);
The problem it signals is probably related to variable c in the scope where that assertion appears having a signed character type (signed char, or char on a system where default char is signed). In all likelihood, the essentials of the code involved boil down to something like this:
char c = getc(fp); /* DO NOT DO THIS */
assert(c>=-1 && c <=255);
That is a common error: getc() and getchar() return type int in order to be able to represent all possible values of type unsigned char, and -1. If you assign the result to a variable of character type, then
You may invoke undefined behavior in the event that the result is outside the range representable by type char (e.g. 128 - 255 on a system having 8-bit, signed default chars)
If the program happens to behave consistently (on which you cannot rely unless the character type in question is unsigned), you lose the ability to distinguish an error condition from valid data.
If the target character type is signed, then although the result behavior is undefined for some inputs, a reasonably likely actual behavior would be for c to take values less than -1 in some cases. In that event you could get an assertion failure. ("could" because nothing is certain when UB is involved.)
To avoid those issues, make sure to assign function results to variables of appropriate type, and in particular, assign the results of getc() and getchar() to a signed integer type at least as wide as int.
the problem is basically two things:
1) that a char cannot hold a negative int (EOF is -1, I.E a negative int)
2) fscanf() stops when it hits white space and a '\n' is white space.
Suggest: using:
int char c;
c = fgetc(fp);
an example program would be:
void get_lines(FILE* fp, int* plines){
int i=0;
int c = 0;
while( EOF != (c = fgetc(fp) ) ) // returns EOF on end of file or error
{
if (c == '\n' )
{
i++;
}
}
*plines = i;
} // end function: get_lines
However, this still has a bug when the text file does not end in a newline

Is it a bad idea to clear the buffer using a macro in C?

i was using a macro i created to clear the buffer after a scanf, but i was told it is not a good idea for "Many reasons". Could you explain me why and how should i clean it? I know that using fflush(stdin) is a very bad idea since it's not defined.
This is the macro i was using:
#define CLEAR_BUFFER do { c = getchar(); } while (c != '\n' && c != EOF);
And also, another question: in the "real world" is scanf used? And if yes, how do people clean the buffer?
Thanks
The idea is good although the execution could be improved:
#define CLEAR_BUFFER() do { int ch; while ( (ch = getchar()) != EOF && ch != '\n' ) {} } while (0)
Your version didn't declare c and can be used incorrectly.
If you're not familiar with do...while(0), see here.
Even better than both of these would be to write a function:
void clear_buffer(void)
{
int ch; while ( (ch = getchar()) != EOF && ch != '\n' ) {}
}
You could make this return bool if you're interested to distinguish whether EOL occurred or there was an error (but the calling code could check feof(stdin) || ferror(stdin) to find that out anyway).
In C99 this could be an inline function although that's not a huge problem if you are in C90 and make it non-inline.
For the second part of the question: I never use scanf, and I clean the buffer in the way I just described. Others may do it differently of course, this is more a question of personal preference.

understanding ungetc use in a simple getword

I've come across such an example of getword.
I understand all the checks and etc. but I have a problem with ungetc.
When the c does satisfy if ((!isalpha(c)) || c == EOF)and also doesn't satisfy while (isalnum(c)) -> it isn't a letter, nor a number - ungetc rejects that char.
Let's suppose it is '\n'.
Then it gets to return word however it can't be returned since it is not saved in any array. What happens then?
while (isalnum(c)) {
if (cur >= size) {
size += buf;
word = realloc(word, sizeof(char) * size);
}
word[cur] = c;
cur++;
c = fgetc(fp);
}
if ((!isalpha(c)) || c == EOF) {
ungetc(c, fp);
}
return word;
EDIT
#Mark Byers - thanks, but that c was rejected for a purpose, and will not satisfy the condition again and again in an infinite loop?
The terminal condition, just before the line you don't understand, is not good. It should probably be:
int c;
...
if (!isalpha(c) && c != EOF)
ungetc(c, fp);
This means that if the last character read was a real character (not EOF) and wasn't an alphabetic character, push it back for reprocessing by whatever next uses the input stream fp. That is, suppose you read a blank; the blank will terminate the loop and the blank will be pushed back so that the next getc(fp) will read the blank again (as would fscanf() or fread() or any other read operation on the file stream fp). If, instead of blank, you got EOF, then there is no attempt to push back the EOF in my revised code; in the original code, the EOF would be pushed back.
Note that c must be an int rather than a char.
ungetc pushes the characters onto the stream so that the next read will return that character again.
ungetc(c, fp); /* Push the character c onto the stream. */
/* ...etc... */
c = fgetc(fp); /* Reads the same value again. */
This can sometimes be convenient if you are reading characters to find out when the current token is complete, but aren't yet ready to read the next token.
OK. Now I understand why that case with eg. '\n' was troubling me. I'm just dumb and forgot about the section in main() referring to getword. Of course before calling getword there are a couple of tests (another ungetc there) and it fputs that characters not satisying isalnum
It emerges from this that while loop in getword always starts with at least one isalnum positive, and the check at then end is just for following characters.

Resources