C99: Is it standard that fscanf() sets eof earlier than fgetc()?

C99: Is it standard that fscanf() sets eof earlier than fgetc()? - c

I tried with VS2017 (32 Bit Version) on a 64 bit Windows PC and it seems to me that fscanf() sets the eof flag immediately after successfully reading the last item within a file. This loop terminates immeadiately after fscanf() has read the last item in the file related to stream:
while(!feof(stream))
{
fscanf(stream,"%s",buffer);
printf("%s",buffer);
}
I know this is insecure code... I just want to understand the behaviour. Please forgive me ;-)
Here, stream is related to an ordinary text file containing strings like "Hello World!". The last character in that file is not a newline character.
However, fgetc(), having processed the last character, tries to read yet another one in this loop, which leads to c=0xff (EOF):
while (!feof(stream))
{
c = fgetc(stream);
printf("%c", c);
}
Is this behaviour of fscanf() and fgetc() standardized, implementation dependent or something else? I am not asking why the loop terminates or why it does not terminate. I am interested in the question if this is standard behaviour.

In my experience, when working with <stdio.h> the precise semantics of the "eof" and "error" bits are very, very subtle, so much so that it's not usually worth it (it may not even be possible) to try to understand exactly how they work. (The first question I ever asked on SO was about this, although it involved C++, not C.)
I think you know this, but the first thing to understand is that the intent of feof() is very much not to predict whether the next attempt at input will reach the end of the file. The intent is not even to say that the input stream is "at" the end of the file. The right way to think about feof() (and the related ferror()) is that they're for error recovery, to tell you a bit more about why a previous input call failed.
And that's why writing a loop involving while(!feof(fp)) is always wrong.
But you're asking about precisely when fscanf hits end-of-file and sets the eof bit, versus getc/fgetc. With getc and fgetc, it's easy: they try to read one character, and they either get one or they don't (and if they don't, it's either because they hit end-of-file or encountered an i/o error).
But with fscanf it's trickier, because depending on the input specifier being parsed, characters are accepted only as long as they're appropriate for the input specifier. The %s specifier, for example, stops not only if it hits end-of-file or gets an error, but also when it hits a whitespace character. (And that's why people were asking in the comments whether your input file ended with a newline or not.)
I've experimented with the program
#include <stdio.h>
int main()
{
char buffer[100];
FILE *stream = stdin;
while(!feof(stream)) {
fscanf(stream,"%s",buffer);
printf("%s\n",buffer);
}
}
which is pretty close to what you posted. (I added a \n in the printf so that the output was easier to see, and better matched the input.) I then ran the program on the input
This
is
a
test.
and, specifically, where all four of those lines ended in a newline. And the output was, not surprisingly,
This
is
a
test.
test.
The last line is repeated because that's what (usually) happens when you write while(!feof(stream)).
But then I tried it on the input
This\n
is\n
a\n
test.
where the last line did not have a newline. This time, the output was
This
is
a
test.
This time, the last line was not repeated. (The output was still not identical to the input, because the output contained four newlines while the input contained three.)
I think the difference between these two cases is that in the first case, when the input contains a newline, fscanf reads the last line, reads the last \n, notices that it's whitespace, and returns, but it has not hit EOF and so does not set the EOF bit. In the second case, without the trailing newline, fscanf hits end-of-file while reading the last line, and so does set the eof bit, so feof() in the while() condition is satisfied, and the code does not make an extra trip through the loop, and the last line is not repeated.
We can see a bit more clearly what's going on if we look at fscanf's return value. I modified the loop like this:
while(!feof(stream)) {
int r = fscanf(stream,"%s",buffer);
printf("fscanf returned %2d: %5s (eof: %d)\n", r, buffer, feof(stream));
}
Now, when I run it on a file that ends with a newline, the output is:
fscanf returned 1: This (eof: 0)
fscanf returned 1: is (eof: 0)
fscanf returned 1: a (eof: 0)
fscanf returned 1: test. (eof: 0)
fscanf returned -1: test. (eof: 1)
We can clearly see that after the fourth call, feof(stream) is not true yet, meaning that we'll make that last, extra, unnecessary, fifth trip through the loop. But we can see that during the fifth trip, fscanf returns -1, indicating (a) that it did not read a string as expected and (b) it reached EOF.
If I run it on input not containing the trailing newline, on the other hand, the output is like this:
fscanf returned 1: This (eof: 0)
fscanf returned 1: is (eof: 0)
fscanf returned 1: a (eof: 0)
fscanf returned 1: test. (eof: 1)
Now, feof is true immediately after the fourth call to fscanf, and the extra trip is not made.
Bottom line: the moral is (the morals are):
Don't write while(!feof(stream)).
Do use feof() and ferror() only to test why a previous input call failed.
Do check the return value of scanf and fscanf.
And we might also note: Do beware of files not ending in newline! They can behave surprisingly differently.
Addendum: Here's a better way to write the loop:
while((r = fscanf(stream,"%s",buffer)) == 1) {
printf("%s\n", buffer);
}
When you run this, it always prints exactly the strings it sees in the input. It doesn't repeat anything; it doesn't do anything significantly differently depending on whether the last line does or doesn't end in a newline. And -- significantly -- it doesn't (need to) call feof() at all!
Footnote: In all of this I've ignored the fact that %s with *scanf reads strings, not lines. Also that %s tends to behave very badly if it encounters a string that's larger than the buffer that's to receive it.

Both of your loops are incorrect: feof(f) is only set after an unsuccessful attempt to read past the end of file. In your code, you do not test for fgetc() returning EOF nor if fscanf() returns 0 or EOF.
Indeed fscanf() can set the end of file condition of a stream if it reaches the end of file, which it does for %s if the file does not contain a trailing newline, whereas fgets() would not set this condition if the file ends with a newline. fgetc() sets the condition only when it returns EOF.
Here is a modified version of your code that illustrates this behavior:
#include <stdio.h>
int main() {
FILE *fp = stdin;
char buf[100];
char *p;
int c, n, eof;
for (;;) {
c = fgetc(fp);
eof = feof(fp);
if (c == EOF) {
printf("c=EOF, feof()=%d\n", eof);
break;
} else {
printf("c=%d, feof()=%d\n", c, eof);
}
}
rewind(fp); /* clears end-of-file and error indicators */
for (;;) {
n = fscanf(fp, "%99s", buf);
eof = feof(fp);
if (n == 1) {
printf("fscanf() returned 1, buf=\"%s\", feof()=%d\n", buf, eof);
} else {
printf("fscanf() returned %d, feof()=%d\n", n, eof);
break;
}
}
rewind(fp); /* clears end-of-file and error indicators */
for (;;) {
p = fgets(buf, sizeof buf, fp);
eof = feof(fp);
if (p == buf) {
printf("fgets() returned buf, buf=\"%s\", feof()=%d\n", buf, eof);
} else
if (p == NULL) {
printf("fscanf() returned NULL, feof()=%d\n", eof);
break;
} else {
printf("fscanf() returned %p, buf=%p, feof()=%d\n", (void*)p, (void*)buf, eof);
break;
}
}
return 0;
}
When run with standard input redirected from a file containing Hello world without a trailing newline, here is the output:
c=72, feof()=0
c=101, feof()=0
c=108, feof()=0
c=108, feof()=0
c=111, feof()=0
c=32, feof()=0
c=119, feof()=0
c=111, feof()=0
c=114, feof()=0
c=108, feof()=0
c=100, feof()=0
c=EOF, feof()=1
fscanf() returned 1, buf="Hello", feof()=0
fscanf() returned 1, buf="world", feof()=1
fscanf() returned -1, feof()=1
fgets() returned buf, buf="Hello world", feof()=1
fscanf() returned NULL, feof()=1
The C Standard specifies the behavior of the stream functions in terms of individual calls to fgetc, fgetc sets the end of file condition when it cannot read a byte from the stream at end of file.
The behavior illustrated above conforms to the Standard and shows how testing feof() is not a good approach to validate input operations. feof() can return non-zero after successful operations and can return 0 before unsuccessful operations. feof() is should only be used to distinguish end of file from input error after an unsuccessful input operation. Very few programs make this distinction, hence feof() is almost never used on purpose and almost always indicates a programming error. For extra explanations, read this: Why is “while ( !feof (file) )” always wrong?

If I might offer a tl;dr to both the comprehensive answers here, formatted input reads characters until it has reason to stop. Since you say
The last character in that file is not a newline character
and the %s directive reads a string of non-whitespace characters, after it reads the ! in World! it has to read another character. There isn't one, which lights eof.
Put whitespace (space, newline, whatever) at the end of the phrase, and your printf will print the last word twice: once because it read it, and again because the scanf failed to find a string to read before hitting eof, so the %s conversion never happened leaving the buffer untouched.

Related

C - Enter dead loop after trying to read line by line of a file and lines might contain space character?

I'm trying to read a file line by line and count the characters of each line. Those line might contains space characters and I need also to count them. I'm only allowed to use feof and scanf functions.
Sample Code
...
while(!feof(stdin)){
char inputLineArray[1000];
scanf("%[^\n]s", inputLineArray);
printf(inputLineArray);
}
...
My sample file is a txt file which contains the following content:
hello world
abcdsdsdsdsd
But after it prints:
hello world
My program is stuck into infinite loop which does nothing.

From man 3 scanf:
The scanf() family of functions scans input according to format as described below.
This means that your provided pattern %[^\n]s (don't match newlines) will stop matching after world because there is a newline. You'd need to skip to the next char in the stream.
There are many questions like yours on Stackoverflow, search for scanf infinite loop.

scanf("%[^\n]s", inputLineArray); is incorrect and inappropriate:
the conversion specifier does not have a trailing s, it is just %[^\n] ;
scanf reads the stream and stores any characters before the newline into inputLineArray and leaves the newline pending in the stream ;
scanf should be given the maximum number of characters to store to avoid undefined behavior on long lines: scanf("%999[^\n]", inputLineArray) ;
you should test the return value of scanf() to determine if the conversion was successful. The test while (!feof(stdin)) is pathologically inappropriate: Why is “while ( !feof (file) )” always wrong? ;
you would then see another problem: this conversion fails on empty lines because there are no characters to store into the destination array, and since scanf() leaves the newline pending, the second call fails and all successive ones too.
Note also that it is highly risky to call printf with user supplied data as a format string. The behavior is undefined if the line contains non trivial format specifications.
Here is a better way to read the file line by line:
#include <stdio.h>
#include <string.h>
...
char inputLineArray[1001];
while (fgets(inputLineArray, sizeof inputLineArray, stdin)) {
buf[strcspn(buf, "\n")] = '\0'; // strip the trailing newline if present
printf("%s\n", inputLineArray);
}
...
Note however that input lines with 1000 bytes or more will be broken into multiple output lines.
scanf() is not the right tool for your purpose, indeed it is full of quirks and shortcomings, but if you are required to use scanf(), here is a corrected version:
char inputLineArray[1000];
while (scanf("%c", &inputLineArray[0]) == 1) {
/* one byte was read, check if it is a newline */
if (inputLineArray[0] == '\n') {
/* empty line must be special cased */
inputLineArray[0] = '\0';
} else {
/* set the null terminator in case the next `scanf` fails */
inputLineArray[1] = '\0';
/* attempt to read the rest of the line */
scanf("%998[^\n]", inputLineArray + 1);
/* consume the pending newline, if any */
scanf("%*1[\n]");
}
printf("%s\n", inputLineArray);
}
if (feof(stdin)) {
/* scanf() failed at end of file, OK */
} else {
printf("read error\n");
}
Note that feof() is not used as scanf("%c", ...) will return EOF at end of file, so the while() loop with stop as expected.
feof() is only used to distinguish end of file from read error conditions in stream I/O. Most C programs do not need to distinguish between these as read errors can be handled the same way as truncated input files. This function is almost always used incorrectly. In short, you should never use feof(), nor other error-prone or deprecated functions such as gets() and strncpy(). Be also very careful with sprintf(), strcpy(), strcat()...

Why multiple EOF enters to end program?

Trying to understand the behavior of my code. I'm expecting Ctrl-D to lead to the program printing the array and exiting, however it takes 3 presses, and it enters the while loop after the second press.
#include <stdio.h>
#include <stdlib.h>
void unyon(int p, int q);
int connected(int p, int q);
int main(int argc, char *argv[]) {
int c, p, q, i, size, *ptr;
scanf("%d", &size);
ptr = malloc(size * sizeof(int));
while((c = getchar()) != EOF){
scanf("%d", &p);
scanf("%d", &q);
printf("p = %d, q = %d\n", p, q);
}
for(i = 0; i < size; ++i)
printf("%d\n", *ptr + i);
free(ptr);
return 0;
}
I read the post here, but I don't quite understand it.
How to end scanf by entering only one EOF
After reading that, I'm expecting the first Ctrl-D to clear the buffer, and then I'm expecting c = getchar() to pick up the second Ctrl-D and jump out. Instead the second Ctrl-D enters the loop and prints p and q, and it takes a third Ctrl-D to drop out.
This is made more confusing by the fact that the code below drops out on the first Ctrl-D-
#include <stdio.h>
main() {
int c, nl;
nl = 0;
while((c = getchar()) != EOF)
if (c == '\n')
++nl;
printf("%d\n", nl);
}

Let's just strip the program down to the calls which do input:
scanf("%d", &size); // Statement 1
while((c = getchar()) != EOF){ // 2
scanf("%d", &p); // 3
scanf("%d", &q); // 4
}
That is definitely not the way to go; we'll get to the correct usage in a bit. For now, let's just analyze what happens. It's important to understand precisely how scanf works. The %d format code causes it to first skip over any whitespace characters, and then read characters as long as the characters can be made into a decimal integer. Eventually some character will be read which is not part of a decimal integer; most likely a newline character. Because the format string is now finished, the unused character which has just been read will be reinserted into the stream.
So when the call to getchar is made, getchar will read and return the newline character which terminated the integer. Inside the loop, there are then two calls to scanf("%d"), each of which will behave as indicated above: skip whitespace if any, read a decimal integer, and reinsert the unused character back into the input stream.
Now, let's suppose that you run the program, and enter the number 42 followed by the enter key, and then Ctrl-D to close the input stream.
The 42 will be read by statement 1, and (as mentioned above) the newline will be read by statement 2. So when statement 3 is executed, there is no more data to be read. Because end-of-file is signaled before any digit is read, scanf will return EOF. However, the code does not test the return value of scanf; it goes on to statement 4.
What should happen at this point is that the scanf in statement 4 should immediately return EOF without attempting to read more input. That's what the C standard says should happen, and it is what Posix says should happen. Once end-of-file has been signaled on a stream, any input request should immediately return EOF until the end-of-file indicator is manually cleared. (See below for standards quotes.)
But glibc, for reasons we won't go into just yet, does not conform to the standard. It attempts another read. And so the user must enter another Ctrl-D, which will cause the scanf at statement 4 to return EOF. Again, the code does not check the return code, so it continues with the while loop and calls getchar again at statement 2. Because of the same bug, getchar does not immediately return EOF, but instead attempts to read a character from the terminal. So the user must now type a third Ctrl-D to cause getchar to return EOF. Finally, the code checks a return code, and the while loop terminates.
So that is the explanation of what is happening. Now, it is easy to see at least one mistake in the code: the return value of scanf is never checked. Not only does this mean that EOF is missed, it also means that input errors are ignored. (scanf would have returned 0 if the input could not be parsed as an integer.) That's serious, because if scanf cannot succesfully match the format code, the value of the corresponding argument is undefined and must not be used.
In short: Always check return values from *scanf. (And other I/O library functions.)
But there is a more subtle mistake as well, which makes little difference in this case but could, in general, be serious. The character read by getchar in statement 2 is simply discarded, regardless of what it was. Normally it will be whitespace, so it doesn't matter that it is discarded, but you don't actually know that because the character is discarded. Maybe it was a comma. Maybe it was a letter. Maybe it matters what it was.
It is bad style to rely on the assumption that whatever character is read by the getchar at statement 2 is unimportant. If you really need to peek at the next character, you should reinsert it into the input stream, just as scanf does:
while ((c = getchar()) != EOF) {
ungetc(c, stdin); /* Put c back into the input stream */
...
}
But actually, that test is not what you want at all. As we have already seen, it is extremely unlikely that getchar will return EOF at this point. (It's possible, but it's very unlikely). Much more more probable is that getchar will read a newline character, even though the next scanf will encounter the end-of-file. So there was absolutely no point peeking at the next character; the correct solution is to check the return code of scanf, as indicated above.
Putting that together, what you really want here is something more like:
/* No reason to use two scanf calls to read two consecutive numbers */
while ((count = scanf("%d%d", &p, &q)) == 2) {
/* Do something with p and q */
}
if (count != EOF) {
/* Invalid format. Issue an error message, at least */
}
/* Do whatever needs to be done at the end of input. */
Finally, let's examine glibc's behaviour. There is a very long-standing bug report linked to by an answer to the question cited in the OP. If you take the trouble to read through to the most recent post in the bugzilla thread, you'll find a link to a discussion on the glibc developer mailing list.
Let me give the TL;DR version, and save you the trouble of digital archaeology. Since C99, the standard has been clear that EOF is "sticky". §7.21.3/11 states that all input is performed as though successive bytes were read by fgetc:
...The byte input functions read characters from the stream as if by successive calls to the fgetc function.
And §7.21.7.1/3 states that fgetc returns EOF immediately if the stream's end-of-file indicator is set:
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function
returns EOF.
So once the end-of-file indicator is set, because either end of file was detected or some read error occurred, subsequent input operations must immediately return EOF without attempting to read from the stream. Various things can clear the end-of-file indicator, including clearerr, seek, and ungetc; once the end-of-file indicator has been cleared, the next input function call will again attempt to read from the stream.
However, it wasn't always like that. Before C99, the result of reading from a stream which had already returned EOF was unspecified. And different standard libraries chose to handle it in different ways.
So a decision was made to not change glibc to conform to the (then) new standard, but rather to maintain compatibility with certain other C libraries, notably Solaris. (A comment in the glibc source is quoted in the bug report.)
Although there is a compelling argument (at least, compelling to me) that fixing the bug is not likely to break anything important, there is still a certain reluctance to do anything about it. And so, here we are, ten years later, with a still-open bug report, and a non-conforming implementation.

If you run it through the debugger you will get a clearer picture. Here is the sequence of events.
scanf("%d", &size); is called.
A number is input followed by ENTER. The key here is that scanf does not consume the \n that results from the ENTER.
getchar is called. This consumes the \n.
scanf("%d", &p); is called. This consumes the first ctrl-D. If the return value were checked then it would be apparent that an error occured.
scanf("%d", &q); is called. This consumes the second ctrl-D.
Loop goes back to the top and calls getchar. The third ctrl-D then causes EOF to be returned by getchar and hence the loop breaks out at that point.
I'll leave it as an exercise for you to explain why the second program functions as expected.

There are different things messing here.
First of all, when you type Ctrl-D to the input terminal, the tty driver is processing your input, adding each character in a buffer and processing special characters. One of these special characters (Ctrl-D) means take up to the last char and make them all available to the system. This makes two things to happen: first, the Ctrl-D character is eliminated from the data stream and; second, all the characters typed up so far are made available to be read(2) by the process syscall. getchar() is a buffered library call that avoids making one read per character, allowing to store previously read characters in the buffer.
Other thing messing here is the way the system signals the end of file in posix systems (and all unix systems). When you make a read(2) system call, the return value is the actual number of characters read (or -1 in case of failure, but this has nothing to do with EOF, as will be explained soon). And the system marks the end of file condition by returning 0 characters. So, the operating system marks the end of file making read(2) return 0 bytes as a result (if you only hit the return key, that will make a \n to appear in the data stream).
The third thing messing up here is the type of return value from getchar(3) function. It doesn't return a char value. As all possible byte values are posible to be returned for getchar(3), there's no possibility to reserve a special value for signalling a EOF. The solution adopted a long, long, time ago (when getchar(3) was designed, that is in the first version of the C language, (see The C programming language by Brian Kernighan and Denis Ritchie, first ed.) was to use an int as return value to be able to return all the possible byte values (0..255) plus one extra value, called EOF. The exact value of EOF is implementation dependant, but normally defined as -1 (I think even the standard specifies now it must be defined as -1, but not sure)
So, making all things work together, EOF is an int constant defined to allow programers to write while ((c = getchar()) != EOF). You will never get -1 as a data value from the terminal. The system always marks the end of file condition by making read(2) to return 0. And the terminal driver on receiving Ctrl-D just eliminates it from the stream and makes data up to, but not including (as different from Ctrl-J or Ctrl-M, line feed and carry return, respectivelly, that are also interpreted and are input as \n in the data stream)
So, next the question is: Why there are needed normally two (or more) Ctrl-D chars to signal eof?
Right, as I've said, one only makes all thata up to the Ctrl-D (but not including it) available to the kernel, so the result from read(2) can be a number different than 0 for the first time. But what is sure is that if you enter the Ctrl-D char twice in sequence, after the first there were not be more chars in between the two chars, assuring a read() of zero chars. Normally, programs are in a loop, doing multiple reads
while ((n_read = read(fd, buffer, sizeof buffer)) > 0) {
/* NORMAL INPUT PROCESSING GOES HERE, for up to n_read bytes
* stored in buffer */
} /* while */
if (n_read < 0) {
/* ERROR PROCESSING GOES HERE */
} else {
/* EOF PROCESSING GOES HERE */
} /* if */
In the case of files, the behaviour is different, as Ctrl-D is not interpreted by any driver (it's stored in the disk file) so you'll get Ctrl-D as a normal character (it's value is \004)
When you read a file, normally this deals to reading a lot of complete buffers, then make a partial read (with less than the buffer size bytes input) and a final read of zero bytes, signalling that the file has ended.
Note
Depending on the configuration of the tty driver in some unices, the eof character can be changed and have different mean. Also happens to the return character and linefeed character. Se termios(3) manual page for a detailed documentation on this.

How to read input in C

I'm trying to read a line with scanf("%[^\n]"); right before it I'm reading an integer with "%d", was told to me that scanf doesn't erase the '\n' after reading, so I have to call fflush() to avoid it, but even doing that I still have the same problems, so here is my code:
scanf("%d", &n);
fflush(stdin);
lines = (char**)malloc(sizeof(char*)*n);
for(i = 0; i < n; i++){
lines[i] = (char*)malloc(sizeof(char)*1001);
}
for(i = 0;i < n;i++){
scanf("%[^\n]", linhes[i]);
}
I read an integer and then the scanf doesn't wait, it starts reading the input — doesn't matter what the integer value is, whether 5 or 10, the scanf reads all the strings to empty. Already tried with fgets and the result is almost the same, except that it reads some of the strings and skips others.

Let us look at this step by step:
"... read a line with scanf("%[^\n]");".
scanf("%[^\n]", buf) does not read a line. It almost does - sometimes. "%[^\n]" directs scanf() to read any number of non-'\n' char until one is encountered (that '\n' is then put back into stdin) or EOF occurs.
This approach has some problems:
If the first char is '\n', scanf() puts it back into stdin without changing buf in anyway! buf is left as is - perhaps uninitialized. scanf() then returns 0.
If at least one non-'\n' is read, it is saved into buf and more char until a '\n' occurs. A '\0' is appended to buf and the '\n' is put back into stdin and scanf() returns 1. This unlimited-ness can easily overfill buf. If no char was saved and EOF or input error occurs, scanf() returns EOF.
Always check the return value of scanf()/fgets(), etc. functions. If your code does not check it, the state of buf is unknown.
In any case, a '\n' is still usually left in stdin, thus the line was not fully read. This '\n' often is an issue for the next input function.
... scanf doesn't erase the '\n' after reading
Another common misconception. scanf() reads a '\n', or not, depending on the supplied format. Some formats consume '\n', others do not.
... call fflush() to avoid it
fflush(stdin) is well defined in some compilers but is not in the C standard. The usual problem is code wants to eliminate any remaining data in stdin. A common alternative, when the end of the line had not yet occurred, is to read and dispose until '\n' is found:
int ch; // Use int
while ((ch = fgetc(stdin)) != '\n' && ch != EOF);
I still have the same problems
The best solution, IMO, is to read a line of user input and then scan it.
char buf[sizeof lines[i]];
if (fgets(buf, sizeof buf, stdin) == NULL) return NoMoreInput();
// If desired, remove a _potential_ trailing \n
buf[strcspn(buf, "\n")] = 0;
strcpy(lines[i], buf);
I recommend that a buffer should be about 2x the size of expected input for typical code. Robust code, not this snippet, would detect if more of the line needs to be read. IMO, such excessively long lines are more often a sign of hackers and not legitimate use.

BLUEPIXY in the comment answered my question:
try "%[^\n]" change to " %[^\n]"

C while loop feof

This is part of a much bigger program to make a filmgenie and for some reason the program crashes as it reaches this while loop and i don't understand what my problem is.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <ctype.h>
FILE *fp, *fopen();
char *arrayoffilms[45];
int main()
{
char line[100],junk[100];
fp=fopen("filmtitles.txt","r");
int i=0;
while(!feof(fp)) {
fscanf(fp,"%[^\n]s",line);
strcpy(arrayoffilms[i],line);
fscanf(fp,"%[\n]s",junk);
i++;
}
fclose(fp);
printf("%s\n\n\n",arrayoffilms[i]);
return 0;
}

feof will never return true until an actual read attempt is made, and EOF has been reached. The read attempts usually have return values that indicate failure. Why not use those, instead?
Don't confuse the %[ and %s format specifiers; %[ doesn't provide a scanset for %s; %[^\n]s tells scanf to read "one or more non-'\n' characters, followed by a 's' character". Does that make sense? Think about it, carefully. What is the purpose of this format specifier? What happens if the user merely presses enter, and scanf doesn't get it's "one or more non-'\n' characters"? Before we look for non-'\n' characters, it's important to get rid of any '\n' characters. Any whitespace bytes in the format string will cause scanf to consume as much whitespace as possible, up until the first non-whitespace character. I'm going to presume you wanted %[^\n], or perhaps even %99[^\n], which would prevent overflows of line.
Perhaps you'd also want to count the number of bytes processed by scanf, so you can malloc the correct length and copy into arrayoffilms, for some reason I can't imagine. You can use the %n format specifier, which will tell you how many bytes scanf processed.
I noticed that you want to read and discard the remainder of a line. In my example, the remainder of a line will only ever be discarded if 99 characters are read before a newline is encountered. I'll use the assignment suppression '*': %*[^\n].
Combining these format specifiers results in a format string of " %99[^\n]%n%*[^\n]", two arguments (a char * for %[ and an int * for %n), and an expected return value of 1 (because 1 input is being assigned). The loop will end when the return value isn't 1, which will likely be caused by an error such as "reading beyond eof".
int length;
while (fscanf(fp, " %99[^\n]%n%*[^\n]", line, &length) == 1) {
arrayoffilms[i] = malloc(length + 1);
strcpy(arrayoffilms[i], line);
i++;
}

The problem might be about the feof. You want your while loop to terminate when you reach the end of the file, or in other words, when you can not get anything using fscanf.
You can go for the code below:
while(fscanf(fp,"%[^\n]s",line)) {
strcpy(arrayoffilms[i],line);
fscanf(fp,"%[\n]s",junk);
i++;
}
Also, error checking associated with file pointers is absolutely necessary and is a good habbit. You would definitely want to use it:
fp=fopen("filmtitles.txt","r");
if(fp == NULL) /* error handling */
printf("Could not open file: filename\n");
else{
/* do stuff */
}

A similar thing happens with fgets() so some people say to never use it. Look at it this way, if you say
while (!feof(ipf)) {
by the time feof() is true you've hit the end of the file. The byte you just read is garbage, maybe a NULL. Don't use it. This works:
while (!feof(ipf)) {
if (!feof(ipf)) {
ch = fgetc(ipf);
And it works for fgets() too, I've used it this way for years. If this were Pascal (or maybe Perl) and you read "until" feof that would work, it's a pre-test vs a post-test issue. So test twice.

Reading multiple lines of input with scanf()

Relevant code snippet:
char input [1024];
printf("Enter text. Press enter on blank line to exit.\n");
scanf("%[^\n]", input);
That will read the whole line up until the user hits [enter], preventing the user from entering a second line (if they wish).
To exit, they hit [enter] and then [enter] again. So I tried all sorts of while loops, for loops, and if statements around the scanf() involving the new line escape sequence but nothing seems to work.
Any ideas?

Try this:
while (1 == scanf("%[^\n]%*c", input)) { /* process input */ }

As was yet pointed out, fgets() is better here than scanf().
You can read an entire line with fgets(input, 1024, stdin);
where stdin is the file associated to the standard input (keyboard).
The function fgets() reads every character from the keyboard up to the first new-line character: '\n' (obtained after pressing ENTER key, of course...).
Important: The character '\n' will be part of the array input.
Now, your next step is to verify if all the characters in the array input,
from the first to the '\n', are blanks.
Besides, note that all the characters after the first '\n' in input are garbage, so you have not to check them.
Your program could be as follows:
char input[1024];
printf("Enter text. Press enter on blank line to exit.\n");
while (1) {
if (fgets(input, 1024, stdin) == NULL)
printf("Input Error...\n");
else {
/* Here we suppose the fgets() has reached a '\n' character... */
for (char* s = input; (*s != '\n') && isspace(*s); s++)
; /* skipping blanks */
if (*s == '\n')
break; /* Blank line */
else
printf("%s\n", input); /* The input was not a blank line */
}
}
That code must be written inside your main() block and,
more importantly, it is necessary to include the header <ctype.h> before all,
because the isspace() function is used.
The code is simple: the while is executed for ever, the user enter a line in each iteration, the if sentences checks if some error has happened.
If everything was fine, then a for(;;) statement is executed, which explores the array input to watch if there are just blanks there... or not.
The for iterations continue up to the first new-line '\n' is found, or well, a non-blank character appears.
When for terminates, it means that the last analyzed character, which is held in *s, is a newline (meaning that all earlier characters were blanks), or not (meaning that at least there is some non-blank character in input[], so input is a normal text).
The "ethernal" while(1) is broken only in case that a blank-line is
read (see the break statement in 11th line).

OP says "To exit, they hit [enter] and then [enter] again"
unsigned ConsecutiveEnterCount = 0;
for (;;) {
char buffer[1024];
if (fgets(buffer, sizeof(buffer), stdin) == NULL) {
break; // handle error or EOF
}
if (buffer[0] == '\n') {
ConsecutiveEnterCount++;
if (ConsecutiveEnterCount >= 2 /* or 1, not clear on OP intent */) {
break;
}
}
else ConsecutiveEnterCount = 0;
// Do stuff with buffer;
}

#include <stdio.h>
int main(){
char arr[40];
int i;
for( i = 0; i < sizeof(arr); i +=2 ){
scanf("%c%c",&arr[i],&arr[i+1]);
if( arr[i] == '\n' && arr[i+1] == '\n' )
break;
}
printf("%s", arr);
return 0;
}

... I tried all sorts of while loops, for loops, and if statements around the scanf() involving the new line escape sequence but nothing seems to work.
It seems you tried everything that you shouldn't have tried, prior to reading! A C programmer is expected to read manuals lest they want to run into undefined behaviour which causes headaches like the one you've experienced. To elaborate, you can't learn C by guessing like you can Java.
Consider this your lesson. Stop guessing and start reading (the fscanf manual)!
According to that manual:
[ Matches a non-empty sequence of bytes from a set of expected bytes (the scanset).
The emphasis is mine. What you seem to be describing is an empty sequence of bytes, which means that the match fails. What does the manual say about matching failures?
Upon successful completion, these functions shall return the number of successfully matched and assigned input items; this number can be zero in the event of an early matching failure. If the input ends before the first conversion (if any) has completed, and without a matching failure having occurred, EOF shall be returned. If an error occurs before the first conversion (if any) has completed, and without a matching failure having occurred, EOF shall be returned...
Again, the emphasis is mine... This is telling you that like most other C-standard functions, you need to check the return value! For example, when you call fopen you then write some idiom along the lines of if (fp == NULL) { /* handle error */ }.
Where's your error handling? Note that the return value isn't merely a binary selection; where n conversions are performed, there are n+2 possible return values in the range of: EOF, 0 .. n. You should understand what each of those means, before you try to use fscanf.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C99: Is it standard that fscanf() sets eof earlier than fgetc()? - c

Related

C - Enter dead loop after trying to read line by line of a file and lines might contain space character?

Why multiple EOF enters to end program?

How to read input in C

C while loop feof

Reading multiple lines of input with scanf()

Categories

Resources