#include <stdio.h>
main() {
long nc;
nc = 0;
while (getchar() != -1)
++nc;
printf("%ld\n", nc);
}
Above is my code from K&R C book for example 1.5.2 but I changed the code so getchar() is checking for -1. However when typing in negative one (-1) the code does not stop reading in getchar(). How do I get the code to stop and why is it not stopping?
The getchar() function reads a single byte from stdin, and returns the value of that byte. For example, on an ASCII based system, the byte value 32 represents a space, the value 33 represents an exclamation point (!), the value 34 represents a double quote ("), and so on. In particular, the characters - and 1 (which make up the string "-1") have the byte values 45 and 48 respectively.
The number -1 does not correspond to any actual character, but rather to the special value EOF (an acronym for end of file) that getchar() will return when there are no more bytes to be read from stdin. (Actually, the EOF value is not guaranteed by the C standard to be equal to -1, although on most systems it is. It is guaranteed to be less than zero, though.)
So your loop, as written, will continue to run until there's no more input to be read. If you're running your code from a terminal, that basically means it will keep running until you type Ctrl+D (on Unixish systems) or Ctrl+Z (on Windows). Alternatively, you could run your program with its input coming from a file (e.g. with my_program < some_file.txt), which would cause the loop to run until it has read the entire file byte by byte.
If you instead want to read a number from stdin, and loop until the number equals -1, you should use scanf() instead.
Related
char buff[1];
int main() {
int c;
c = getchar();
printf("%d\n", c); //output -1
c = getchar();
printf("%d\n", c); // output -1
int res;
//here I get a prompt for input. What happened to EOF ?
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
return 0;
}
The resulting output showed with commented lines in the code is the result of simply typing Ctrl-D (EOF on macOS).
I'm a bit confused about the behaviour of getchar(), especially when compared to read.
Shouldn't the read system calls inside the while loop also return EOF? Why do they prompt the user? Has some sort of stdin clear occurred?
Considering that getchar() uses the read system call under the hood how come they behave differently? Shouldn't the stdin be "unique" and the EOF condition shared?
How come in the following code the two read system calls return both EOF when a Ctrl-D input is given?
int res;
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
while ((res = read(0, buff, 1)) > 0) {
printf("Hello\n");
}
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
P.S I'm using a Mac OS machine
Once the end-of-file indicator is set for stdin, getchar() does not attempt to read.
Clear the end-of-file indicator (e.g. clearerr() or others) to re-try reading.
The getchar function is equivalent to getc with the argument stdin.
The getc function is equivalent to fgetc ...
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).
read() still tries to read each time.
Note: Reading via a FILE *, like stdin, does not attempt to read if the end-of-file indicator is set. Yet even if the error indicator is set, a read attempt still occurs.
MacOs is a derivative of BSD unix systems. Its stdio implementation does not come from GNU software and so it is a different implementation. On EOF, the file descriptor is marked as erroneous when issuing a read(2) system call and receiving 0 as the number of characters returned by read, and so, it doesn't read(2) it again until the error condition is reset, and this produces the behaviour you observe. Use clearerr(stream); on the FILE * descriptor before issuing the next getchar(3) call, and everything will be fine. You can do that with glib also, and then, your program will run the same in either implementation of stdio (glib vs. bsd)
I'm trying to find a logic behind all this. Hope that someone could make it clear what EOF really is a how it really behaves.
EOF is simply a constant (normally it's valued as -1) that is different to any possible char value returned by getchar(3) (getchar() returns an int in the interval 0..255, and not a char for this purpose, to extend the range os possible characters with one more to represent the EOF condition, but EOF is not a char) The end of file condition is so indicated by the getchar family of functions (getchar, fgetc, etc) as the end of file condition is signalled by a read(2) return value of 0 (the number of returned characters is zero) which doesn't map as some character.... for that reason, the number of possible chars is extended to an integer and a new value EOF is defined to be returned when the end of file condition is reached. This is compatible with files that have Ctrl-D characters (ASCII EOT or Cntrl-D, decimal value 4) and not representing an END OF FILE condition (when you read an ASCII EOT from a file it appears as a normal character of decimal value 4)
The unix tty implementation, on the other side, allows on line input mode to use a special character (Ctrl-D, ASCII EOT/END OF TRANSMISSION, decimal value 4) to indicate and end of stream to the driver.... this is a special character, like ASCII CR or ASCII DEL (that produce line editing in input before feeding it to the program) in that case the terminal just prepares all the input characters and allows the application to read them (if there's none, none is read, and you got the end of file) So think that the Cntrl-D is only special in the unix tty driver and only when it is working in canonical mode (line input mode). So, finally, there are only two ways to input data to the program in line mode:
pressing the RETURN key (this is mapped by the terminal into ASCII CR, which the terminal translates into ASCII LF, the famous '\n' character) and the ASCII LF character is input to the program
pressing the Ctrl-D key. this makes the terminal to grab all that was keyed in upto this moment and send it to the program (without adding the Ctrl-D itself) and no character is added to the input buffer, what means that, if the input buffer was empty, nothing is sent to the program and the read(2) call reads effectively zero characters from the buffer.
To unify, in every scenario, the read(2) system call normally blocks into the kernel until one or more characters are available.... only at end of file, it unblocks and returns zero characters to the program. THIS SHOULD BE YOUR END OF FILE INDICATION. Many programs read an incomplete buffer (less than the number of characters you passed as parameter) before a true END OF FILE is signalled, and so, almost every program does another read to check if that was an incomplete read or indeed it was an end of file indication.
Finally, what if I want to input a Cntrl-D character as itself to a file.... there's another special character in the tty implementation that allows you to escape the special behaviour on the special character this one precedes. In today systems, that character is by default Ctrl-V, so if you want to enter a special character (even ?Ctrl-V) you have to precede it with Ctrl-V, making entering Ctrl-D into the file to have to input Ctrl-V + Ctrl-D.
I know this has been discussed before, but I want to make sure I understand correctly, what is happening in this program, and why. On page 20 of Dennis Ritchie's textbook, The C Programming Language, we see this program:
#include <stdio.h>
int main()
{
int c;
c = getchar();
while(c != EOF){
putchar(c);
c = getchar();
}
return 0;
}
When executed, the program reads each character keyed in and prints them out in the same order after the user hits enter. This process is repeated indefinitely unless the user manually exits out of the console. The sequence of events is as follows:
The getchar() function reads the first character keyed in and assigns its value to c.
Because c is an integer type, the character value that getchar() passed to c is promoted to it's corresponding ASCII integer value.
Now that c has been initialized to some integer value, the while loop can test to see if that value equals the End-Of-File character. Because the EOF character has a macro value of -1, and because none of the characters that are possible to key in have a negative decimal ASCII value, the condition of the while loop will always be true.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop. If the user only keys in one character before execution, then the program reads the <return> value as the next character and prints a new line and waits for the next input to be keyed in.
Is any of this remotely correct?
Yes, you've basically got it. But it's even simpler: getchar and putchar return and accept int types respectively already. So there's no type promotion happening. You're just taking in characters and sending them out in a loop until you see EOF.
Your intuition about why those should be int and not some char form is likely correct: the int type allows for a sentinel EOF value that is outside the value range of any possible character value.
(The K&R stdio functions are very old at this point, they don't know about Unicode and etc, and some of the underlying design rationales are if not murky, just not relevant. Not a lot of practical code these days would use these functions. That book is excellent for a lot of things but the code examples are fairly archaic.)
(Also, fwiw, your question title refers to "copying a file", which you still can do this way, but there are more canonical ways)
Well, it is correct in idea, but not in details, and that's where the devil is in.
The getchar() function reads the first character from standard input and returns it as an unsigned char promoted to int (or the special EOF value if no character was read)
The return value is assigned into c, which is of type int (as it should, as if it were a char strange things could happen)
Now that c has been assigned some integer value, the while loop can test to see if that value equals the value of the EOF macro.
Because the EOF macro has an implementation-specified negative value, and because the characters were converted to unsigned char and promoted to int, none of them have a negative value (at least not in any systems that you'd meet a a novice), the condition of the while loop will always be true until the End-of-File condition happens or an error happens when reading standard input.
Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.
The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop.
The standard input, if it is connected to a terminal device, is usually line-buffered, meaning that the program does not receive any of the characters on the line until the user has completed the line and hit the Enter key.
Instead of ASCII, we speak of the execution character set, which nowadays might often be individual bytes of UTF-8 encoded Unicode characters. EOF is negative in binary too, we do not need to think about "its decimal value". The char and unsigned char types are numbers too, and the character constants are of type int - i.e. on systems where the execution character set is compatible with ASCII, writing ' ' will be the same thing as writing 32, though of course clearer to those who don't remember ASCII codes.
Finally, C is very strict about the meaning of initialization. It is the setting of the initial value into a variable when it is declared.
int c = getchar();
has an initialization.
int c;
c = getchar();
has c uninitialized, and then assigned a value. Knowing the distinction makes it easier to understand compiler error messages when they refer to initialization or assignment.
As the title says can EOF be on the same line as "\n"? I am currently working on a project recreating getline in C and I am on the part of implementing what happens at EOF and I was trying to figure out will I get the last line of a text file as:
"blahblahblahblah\n\0"
or
"blahblahblah\n
\0"
Can EOF be on the same line as \n
Yes, but rarely.
EOF is a constant usually used to signal 1) end-of-file or a 2) rare input error. It is not usually encoded as a character in a text stream.
A return value of EOF from fgetc() and friends can occur at any time - even in the middle of a line due to an error.
Otherwise, no:
A end-of-file does not occur at the end of a line with a '\n'. Typically the last line in a file that contains at least 1 character (and no `'\n') will also be a line.
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
EOF is a signal to notify that there is no more input. There is no actual data in the file called EOF at the end. EOF is not same of null character \0.
I think you want to know whether you can get an EOF without getting a newline first. The answer is yes. It is possible if there is no newline at the end of file, i.e. when the last line finished without a newline at end. You can generate such a file easily by opening a text editor, typing some text and save, quit without hitting ENTER. That file won't have any newline at end.
Expanding on #taskinoor's and #chux's answers and after looking at your examples, I think you might be confusing EOF with the '\0'-terminating byte for strings.
EOF is a value that functions like getc return to tell the user that it was not able to read any more from the stream. That's why you often see this:
// fp is a FILE*, it may be stdin
// it may be a file on the disk or something
// else
int c;
while((c = getc(fp)) != EOF)
{
// doing something with the character
}
Note here that c is declare as int, because EOF is a signed int and that
is what getc returns. I previously made a statement here1 that isn't that correct, chux
pointed out my mistake and I'd like to quote his/her comment here instead:
User chux comment on my answer
EOF, as a negative constant, is often -1 and can fit in a char, when char a signed char.
It is not so much that it does not fit, it is that storing a EOF in a char can be indistinguishable
from storing say a character with the value of 255 in a char.
'\0' is on the other hand is the byte that is used to terminate a string in C. In
C a strings is nothing more than a sequence of characters that ends with that
byte. We use char arrays to store strings. So EOF and \0 are different
things and have even different values.
Footnote
1Note here that c is declare as int, because EOF is a signed int and
actually doesn't fit in a char, so it will never be part of a c-string.
I encrypted a text file using an offset cipher in C. For this, I simply added 128 to each character and got the file size decreased by 3 bytes. I tried the same on some other files too just to get the same result, i.e. decrease in file size by 3 bytes. I got the original size after decryption.
Could you please tell me why does it so happen?
Code for the main logic is given below:
while((ch=fgetc(fs))!=EOF){
fputc(ch+128, ft);
Could you please tell me why does it so happen?
Your ch probably has the wrong declaration. The fputc() function returns an int, not a char, and if you cast to char you will lose the distinction between (char) 0xff and EOF.
// WRONG WRONG WRONG
// char ch = fgetc(fs);
The right declaration:
int ch = fgetc(fs);
Otherwise, it shouldn't happen. Is your process exiting cleanly? If you abort(), then there might be data still in FILE * buffers. Show more code. Run with Valgrind. Check the exit status of your process.
I think the file size should have doubled as two bytes were taken for one character after encryption as something greater than 127 can not be stored in 1 byte.
No, fputc() does not work that way. The fputc() man page (run man fputc in a terminal, unless on Windows):
fputc() writes the character c, cast to an unsigned char, to stream.
Conversion to unsigned char is done by taking the value modulo 256*. So fputc() always writes exactly one byte of data (unlesss it fails).
* This is true all but exceedingly rare systems.
If you talk about Windows, I could imagine that you have opened the file in text mode, not in binary mode.
That leads to the following:
Writing \n leads to a \r\n written to the file.
Reading \r\n from the file gives only \n to the user.
Reading stops at the first \x1A, being a EOF character.
If you add 128 to each byte, the data-to-be-written rolls over at 256. While it may be undefined behaviour to call fputc() with a value > 256 (you should write (ch+128)%256 or (ch+128) & 0xFF), on your systems it obviously writes the value wrapped by 256 and thus you may get \n or \x1A by accident.
I was reading K&R book and wanted to test out printf() and putchar() functions in ways that I never tried. I encountered several unexpected events and would like to hear from more experienced programmers why that happens.
char c;
while((c = getchar()) != EOF) {
//put char(c);
printf("%d your character was.\n", c);
}
How would one get EOF (end of file) in the input stream (getchar() or scanf() functions)? Unexpected key that is not recognized by getchar()/scanf() function could produce it?
In my book, it says that c has to be an integer, because it needs to store EOF and the variable has to be big enough to hold any possible char that EOF can hold. This doesn't make sense for me, because EOF is a constant integer with a value of -1, which even char can store. Can anyone clarify what was meant by this?
What happens when I send "hello" or 'hello' to putchar() function? It expects to get an integer, but returns weird output, such as EE or oo, if I send the latter string or char sequence.
Why when I use printf() function that is written above I get two outputs? One is the one I entered and the other one is integer, which in ASCII is end of line. Does it produce the second output, because I press enter, which it assumes to be the second character?
Thanks.
on Linux, you can send it with Ctrl+d
you need an int, otherwise you can't make the difference between EOF and the last possible character (0xFFFF is not the same than 0x00FF)
putchar wants a character, not a string, if you're trying to give it a string, it'll print a part of the string address
you only get one output: the ascii value of the character you entered, the other "input" is what you typed in the terminal
edit - more details about 2
You need an int, because getchar can returns both a character of value -1 (0x00FF) and an integer of value -1 (0xFFFF), they don't have the same meaning: the character of value -1 is a valid character (for instance, it's ÿ in latin-1) while the integer of value -1 is EOF in this context.
Here's a simple program that shows the difference:
#include <stdio.h>
int main(int argc, char ** argv) {
{
char c = 0xFF; /* this is a valid char */
if (c == EOF) printf("wrong end of file detection\n");
}
{
int c = 0xFF; /* this is a valid char */
if (c == EOF) printf("wrong end of file detection\n");
}
}
The first test succeeds because 0xFF == -1 for char, while the second tests fails because 0x00FF != -1 for int.
I hope that makes it a bit clearer.
you must close the input stream to get EOF. Usually with CTRL-D in UNIX, but see you tty config (stty -a)
already answered
same
your tty echoes what you type by default. If you don't want this, set it in noecho mode (stty -echo). Becareful as some shells sets it again to echo. Try with sh. You must be aware taht tty also buffers your inputs until RETURN is types (see stty manual for raw mode if you need).